Scalable Detection of Salient Entities in News Articles

Eliyar Asgarieh; Kapil Thadani; Neil O'Hare

Scalable Detection of Salient Entities in News Articles

Eliyar Asgarieh, Kapil Thadani, Neil O'Hare

TL;DR

This work tackles scalable detection of salient entities in news by adapting RoBERTa-style transformers with two efficient heads: tagging spans and mean+max pooling. Pooling enables single-pass encoding of documents for all candidate entities, delivering strong performance across NYT-Salience, WN-Salience, and SEL-Wikinews while markedly reducing computation compared to per-entity re-encoding. The authors further show that knowledge distillation from teacher ensembles yields small, well-calibrated models that match or exceed large baselines, with temperature scaling offering controlled calibration. Empirical analyses reveal that models capture classic salience signals such as position and frequency, transfer reasonably across datasets, and benefit from calibration strategies, making the approach practical for real-time news systems. Overall, the paper demonstrates a scalable, accurate pipeline for salient-entity detection with practical deployment considerations and thoughtful evaluation across multiple datasets.

Abstract

News articles typically mention numerous entities, a large fraction of which are tangential to the story. Detecting the salience of entities in articles is thus important to applications such as news search, analysis and summarization. In this work, we explore new approaches for efficient and effective salient entity detection by fine-tuning pretrained transformer models with classification heads that use entity tags or contextualized entity representations directly. Experiments show that these straightforward techniques dramatically outperform prior work across datasets with varying sizes and salience definitions. We also study knowledge distillation techniques to effectively reduce the computational cost of these models without affecting their accuracy. Finally, we conduct extensive analyses and ablation experiments to characterize the behavior of the proposed models.

Scalable Detection of Salient Entities in News Articles

TL;DR

Abstract

Paper Structure (22 sections, 5 equations, 5 figures, 10 tables)

This paper contains 22 sections, 5 equations, 5 figures, 10 tables.

Introduction
Related Work
Modeling Salience with Transformers
Entity classification
Tagging entity spans
Pooling representations
Knowledge Distillation
Datasets
Experiments
Implementation Details
Comparison against Baselines
Ablation of Model Variants
Evaluation of Knowledge Distillation
Stratified Analysis of Results
Conclusion
...and 7 more sections

Figures (5)

Figure 1: Fragments of a labeled news article from the NYT-Salience dataset. Salient entities (highlighted in orange) include Dan Hawkins, Zen and University of California-Davis while the remaining entities (underlined in blue) are not salient.
Figure 2: Model architectures using tagging and pooling for salience estimation. The classifier layer is a 2-layer MLP with ReLU activations and sigmoid output. Pooling concatenates max- and mean-pooled contextual representations for the highlighted tokens.
Figure 3: Stratification of results by relative entity position, i.e., whether the first occurrence of a unique mention is within the first $x\%$ of words in the document. Above: AP using the RoBERTa-Large pooling model. Below: Fraction of positive ground truth labels (i.e., salient entities) in each dataset.
Figure 4: Stratification of results by the frequency of an entity. Above: AP using the RoBERTa-Large pooling model. Below: Fraction of positive ground truth labels (i.e., salient entities) in each dataset.
Figure 5: Calibration (ECE $\downarrow$) and average precision (AP $\uparrow$) metrics of distilled pooling models as the teacher model temperature $T_\text{teacher}$ is varied during distillation on the WN-Salience dataset. Blue lines show metrics when the student model temperature $T_\text{student}$ is set equal to $T_\text{teacher}$, while red lines show $T_\text{student}$ fixed to 1. Solid lines show RoBERTa-Base student models while dashed lines indicate DistilRoBERTa models. Highlighted points ($\circ$) indicate the best result for each plot.

Scalable Detection of Salient Entities in News Articles

TL;DR

Abstract

Scalable Detection of Salient Entities in News Articles

Authors

TL;DR

Abstract

Table of Contents

Figures (5)