How to Discern Important Urgent News?
Oleg Vasilyev, John Bohannon
TL;DR
This work tackles rapid identification of important urgent news (IUN) without resorting to costly LLM scoring. It shows that simple cluster-level features, particularly the distance-based measure $D^{90}_{50} = D_{90} - D_{50}$ computed on embeddings reduced by UMAP and clustered with HDBSCAN, KMeans, or Agglomerative methods, correlate strongly with LLM-derived IUN scores across four news datasets and multiple configurations. The results support using clustering-derived IUN proxies as fast surrogates for LLM scoring, enabling quick ranking and filtering of news by importance and urgency. This approach offers a scalable, cost-efficient alternative with practical impact for trend monitoring, coverage planning, and real-time news curation.
Abstract
We found that a simple property of clusters in a clustered dataset of news correlate strongly with importance and urgency of news (IUN) as assessed by LLM. We verified our finding across different news datasets, dataset sizes, clustering algorithms and embeddings. The found correlation should allow using clustering (as an alternative to LLM) for identifying the most important urgent news, or for filtering out unimportant articles.
