Table of Contents
Fetching ...

Predicting Sentence-Level Factuality of News and Bias of Media Outlets

Francielle Vargas, Kokil Jaidka, Thiago A. S. Pardo, Fabrício Benevenuto

TL;DR

This work tackles automated credibility assessment at scale by introducing FactNews, a large sentence-level dataset of Brazilian Portuguese consisting of 6,191 annotated sentences across 100 stories to predict sentence-level factuality and media bias. It presents two BERT-based baselines, one for factuality and one for bias, and analyzes additional feature sets (POS, polarity/emotion lexicons, TF-IDF) to gauge performance. The dataset uses AllSides-guided annotations for three labels (factual spans, biased spans across 12 bias types, and quotes), with high inter-annotator agreement (κ = 0.82). Key findings show biased sentences are longer and more emotionally loaded, while factual sentences are more impartial; best factuality performance reaches 88% F1, and bias detection reaches 67% F1, demonstrating the value of fine-grained, language-specific reliability signals for media scrutiny and fact-checking in Portuguese-speaking contexts.

Abstract

Automated news credibility and fact-checking at scale require accurately predicting news factuality and media bias. This paper introduces a large sentence-level dataset, titled "FactNews", composed of 6,191 sentences expertly annotated according to factuality and media bias definitions proposed by AllSides. We use FactNews to assess the overall reliability of news sources, by formulating two text classification problems for predicting sentence-level factuality of news reporting and bias of media outlets. Our experiments demonstrate that biased sentences present a higher number of words compared to factual sentences, besides having a predominance of emotions. Hence, the fine-grained analysis of subjectivity and impartiality of news articles provided promising results for predicting the reliability of media outlets. Finally, due to the severity of fake news and political polarization in Brazil, and the lack of research for Portuguese, both dataset and baseline were proposed for Brazilian Portuguese.

Predicting Sentence-Level Factuality of News and Bias of Media Outlets

TL;DR

This work tackles automated credibility assessment at scale by introducing FactNews, a large sentence-level dataset of Brazilian Portuguese consisting of 6,191 annotated sentences across 100 stories to predict sentence-level factuality and media bias. It presents two BERT-based baselines, one for factuality and one for bias, and analyzes additional feature sets (POS, polarity/emotion lexicons, TF-IDF) to gauge performance. The dataset uses AllSides-guided annotations for three labels (factual spans, biased spans across 12 bias types, and quotes), with high inter-annotator agreement (κ = 0.82). Key findings show biased sentences are longer and more emotionally loaded, while factual sentences are more impartial; best factuality performance reaches 88% F1, and bias detection reaches 67% F1, demonstrating the value of fine-grained, language-specific reliability signals for media scrutiny and fact-checking in Portuguese-speaking contexts.

Abstract

Automated news credibility and fact-checking at scale require accurately predicting news factuality and media bias. This paper introduces a large sentence-level dataset, titled "FactNews", composed of 6,191 sentences expertly annotated according to factuality and media bias definitions proposed by AllSides. We use FactNews to assess the overall reliability of news sources, by formulating two text classification problems for predicting sentence-level factuality of news reporting and bias of media outlets. Our experiments demonstrate that biased sentences present a higher number of words compared to factual sentences, besides having a predominance of emotions. Hence, the fine-grained analysis of subjectivity and impartiality of news articles provided promising results for predicting the reliability of media outlets. Finally, due to the severity of fake news and political polarization in Brazil, and the lack of research for Portuguese, both dataset and baseline were proposed for Brazilian Portuguese.
Paper Structure (20 sections, 2 figures, 6 tables)

This paper contains 20 sections, 2 figures, 6 tables.

Figures (2)

  • Figure 1: FactNews annotation schema.
  • Figure 2: The cross-domain distribution of factual and biased sentences from different media outlets.