Evaluating Trustworthiness of Online News Publishers via Article Classification

John Bianchi; Manuel Pratelli; Marinella Petrocchi; Fabio Pinelli

Evaluating Trustworthiness of Online News Publishers via Article Classification

John Bianchi, Manuel Pratelli, Marinella Petrocchi, Fabio Pinelli

TL;DR

The paper addresses automatic estimation of online news publisher trustworthiness by predicting NewsGuard-derived trust levels from article content. It employs a BERT-based, article-level multiclass classifier trained on a dataset of $4033$ articles from $40$ outlets, with labels for both trustworthiness and topic, and evaluates via $F1$-Macro and $F1$-Micro using $10$-fold stratified cross-validation. Results show strong topic discrimination ($F1$-Macro=$0.925$, $F1$-Micro=$0.929$) and robust trustworthiness prediction ($F1$-Macro=$0.843$, $F1$-Micro=$0.882$), with misclassifications mainly between Conspiracy and other topics and between adjacent trust levels. The findings suggest article content can provide meaningful signals about publisher trustworthiness, enabling reader alerts and aiding journalistic organizations in sampling and evaluating unfamiliar outlets, while highlighting avenues for refinement, such as expanding the dataset and incorporating explainability. Practical impact lies in scalable, article-driven screening to mitigate exposure to low-credibility content.

Abstract

The proliferation of low-quality online information in today's era has underscored the need for robust and automatic mechanisms to evaluate the trustworthiness of online news publishers. In this paper, we analyse the trustworthiness of online news media outlets by leveraging a dataset of 4033 news stories from 40 different sources. We aim to infer the trustworthiness level of the source based on the classification of individual articles' content. The trust labels are obtained from NewsGuard, a journalistic organization that evaluates news sources using well-established editorial and publishing criteria. The results indicate that the classification model is highly effective in classifying the trustworthiness levels of the news articles. This research has practical applications in alerting readers to potentially untrustworthy news sources, assisting journalistic organizations in evaluating new or unfamiliar media outlets and supporting the selection of articles for their trustworthiness assessment.

Evaluating Trustworthiness of Online News Publishers via Article Classification

TL;DR

articles from

outlets, with labels for both trustworthiness and topic, and evaluates via

-Macro and

-Micro using

-fold stratified cross-validation. Results show strong topic discrimination (

-Macro=

-Micro=

) and robust trustworthiness prediction (

-Macro=

-Micro=

), with misclassifications mainly between Conspiracy and other topics and between adjacent trust levels. The findings suggest article content can provide meaningful signals about publisher trustworthiness, enabling reader alerts and aiding journalistic organizations in sampling and evaluating unfamiliar outlets, while highlighting avenues for refinement, such as expanding the dataset and incorporating explainability. Practical impact lies in scalable, article-driven screening to mitigate exposure to low-credibility content.

Abstract

Paper Structure (15 sections, 2 equations, 7 figures, 3 tables)

This paper contains 15 sections, 2 equations, 7 figures, 3 tables.

Introduction
Results
Applications
Problem definition
Dataset
Online Media Outlets Selection
Articles collection
Data cleaning
Results and discussion
Topic detection
Trustworthiness Detection
Related Work
On the Evaluation of News Publisher's Trustworthiness
Conclusions
Appendix

Figures (7)

Figure 1: Number of articles (top) and news outlets (bottom) per trustworthiness level, broken down by topic.
Figure 2: Evaluation results for topic detection. The yellow dotted line represents the average, while the red line represents the median.
Figure 3: Topic: Confusion matrix for the fold with the lowest (top) and the highest (bottom) F1 macro
Figure 4: Evaluation results for trustworthiness level detection. The yellow dotted line represents the average, while the red line represents the median.
Figure 5: Trustworthiness level: Confusion matrix for the fold with the lowest (top) and highest (bottom) F1 macro
...and 2 more figures

Evaluating Trustworthiness of Online News Publishers via Article Classification

TL;DR

Abstract

Evaluating Trustworthiness of Online News Publishers via Article Classification

Authors

TL;DR

Abstract

Table of Contents

Figures (7)