Scalable Detection of Salient Entities in News Articles
Eliyar Asgarieh, Kapil Thadani, Neil O'Hare
TL;DR
This work tackles scalable detection of salient entities in news by adapting RoBERTa-style transformers with two efficient heads: tagging spans and mean+max pooling. Pooling enables single-pass encoding of documents for all candidate entities, delivering strong performance across NYT-Salience, WN-Salience, and SEL-Wikinews while markedly reducing computation compared to per-entity re-encoding. The authors further show that knowledge distillation from teacher ensembles yields small, well-calibrated models that match or exceed large baselines, with temperature scaling offering controlled calibration. Empirical analyses reveal that models capture classic salience signals such as position and frequency, transfer reasonably across datasets, and benefit from calibration strategies, making the approach practical for real-time news systems. Overall, the paper demonstrates a scalable, accurate pipeline for salient-entity detection with practical deployment considerations and thoughtful evaluation across multiple datasets.
Abstract
News articles typically mention numerous entities, a large fraction of which are tangential to the story. Detecting the salience of entities in articles is thus important to applications such as news search, analysis and summarization. In this work, we explore new approaches for efficient and effective salient entity detection by fine-tuning pretrained transformer models with classification heads that use entity tags or contextualized entity representations directly. Experiments show that these straightforward techniques dramatically outperform prior work across datasets with varying sizes and salience definitions. We also study knowledge distillation techniques to effectively reduce the computational cost of these models without affecting their accuracy. Finally, we conduct extensive analyses and ablation experiments to characterize the behavior of the proposed models.
