Table of Contents
Fetching ...

On Context-aware Detection of Cherry-picking in News Reporting

Israa Jaradat, Haiqi Zhang, Chengkai Li

TL;DR

This work tackles the problem of detecting cherry-picking in news by identifying omitted important statements through cross-narrative context. It formalizes the task as $c_i = \{\exists s \in S_e: f(s,d)=1\} - d_i$ and develops a context-aware framework combining fine-tuned transformers, zero-/few-shot prompting of LLMs, and unsupervised baselines, evaluated on a novel Cherry dataset with 3,346 examples. The main findings show a best F1 of about 0.89 and accuracy around 0.90, with Longformer-large reaching 0.897 accuracy and 0.887 F1 at 500 words of context; results indicate that contextual information from multiple narratives improves detection and that biases in outlets moderately influence measured cherry-picking. The work contributes a publicly released dataset and demonstrates a scalable approach for auditing omission bias in news reporting, with implications for media credibility assessment and automated fact-checking pipelines.

Abstract

Cherry-picking refers to the deliberate selection of evidence or facts that favor a particular viewpoint while ignoring or distorting evidence that supports an opposing perspective. Manually identifying cherry-picked statements in news stories can be challenging. In this study, we introduce a novel approach to detecting cherry-picked statements by identifying missing important statements in a target news story using language models and contextual information from other news sources. Furthermore, this research introduces a novel dataset specifically designed for training and evaluating cherry-picking detection models. Our best performing model achieves an F-1 score of about 89% in detecting important statements. Moreover, results show the effectiveness of incorporating external knowledge from alternative narratives when assessing statement importance.

On Context-aware Detection of Cherry-picking in News Reporting

TL;DR

This work tackles the problem of detecting cherry-picking in news by identifying omitted important statements through cross-narrative context. It formalizes the task as and develops a context-aware framework combining fine-tuned transformers, zero-/few-shot prompting of LLMs, and unsupervised baselines, evaluated on a novel Cherry dataset with 3,346 examples. The main findings show a best F1 of about 0.89 and accuracy around 0.90, with Longformer-large reaching 0.897 accuracy and 0.887 F1 at 500 words of context; results indicate that contextual information from multiple narratives improves detection and that biases in outlets moderately influence measured cherry-picking. The work contributes a publicly released dataset and demonstrates a scalable approach for auditing omission bias in news reporting, with implications for media credibility assessment and automated fact-checking pipelines.

Abstract

Cherry-picking refers to the deliberate selection of evidence or facts that favor a particular viewpoint while ignoring or distorting evidence that supports an opposing perspective. Manually identifying cherry-picked statements in news stories can be challenging. In this study, we introduce a novel approach to detecting cherry-picked statements by identifying missing important statements in a target news story using language models and contextual information from other news sources. Furthermore, this research introduces a novel dataset specifically designed for training and evaluating cherry-picking detection models. Our best performing model achieves an F-1 score of about 89% in detecting important statements. Moreover, results show the effectiveness of incorporating external knowledge from alternative narratives when assessing statement importance.
Paper Structure (21 sections, 1 equation, 7 figures, 7 tables)

This paper contains 21 sections, 1 equation, 7 figures, 7 tables.

Figures (7)

  • Figure 1: Data collection and preparation pipeline.
  • Figure 2: Effect of context size (measured in words) on models' performance.
  • Figure 3: Model performance when using a context collected and summarized in different lengths from biased news sources instead of a neutral source.
  • Figure 4: Model performance when using a context collected from biased news sources and summarized in 500 words then trimmed at different lengths.
  • Figure 5: Cherry-picking data annotation interface.
  • ...and 2 more figures