Table of Contents
Fetching ...

News Source Credibility Assessment: A Reddit Case Study

Arash Amini, Yigit Ege Bayiz, Ashwin Ram, Radu Marculescu, Ufuk Topcu

TL;DR

This paper addresses the challenge of determining credibility of Reddit-originated political content by shifting focus from truth appraisal to source credibility. It introduces CREDiBERT, a semi-supervised, Siamese transformer model trained on millions of paired submissions and paired with a weighted post-to-post network to incorporate social-reaction signals while preserving user privacy. The approach yields notable gains in F1 and overall accuracy over several baselines and enables topic-aware assessments of subreddit susceptibility to low-credibility information. By combining content-based embeddings with interaction-aware graph representations, the work offers a scalable, privacy-conscious framework for platform-specific credibility analysis with potential applicability beyond Reddit.

Abstract

In the era of social media platforms, identifying the credibility of online content is crucial to combat misinformation. We present the CREDiBERT (CREDibility assessment using Bi-directional Encoder Representations from Transformers), a source credibility assessment model fine-tuned for Reddit submissions focusing on political discourse as the main contribution. We adopt a semi-supervised training approach for CREDiBERT, leveraging Reddit's community-based structure. By encoding submission content using CREDiBERT and integrating it into a Siamese neural network, we significantly improve the binary classification of submission credibility, achieving a 9% increase in F1 score compared to existing methods. Additionally, we introduce a new version of the post-to-post network in Reddit that efficiently encodes user interactions to enhance the binary classification task by nearly 8% in F1 score. Finally, we employ CREDiBERT to evaluate the susceptibility of subreddits with respect to different topics.

News Source Credibility Assessment: A Reddit Case Study

TL;DR

This paper addresses the challenge of determining credibility of Reddit-originated political content by shifting focus from truth appraisal to source credibility. It introduces CREDiBERT, a semi-supervised, Siamese transformer model trained on millions of paired submissions and paired with a weighted post-to-post network to incorporate social-reaction signals while preserving user privacy. The approach yields notable gains in F1 and overall accuracy over several baselines and enables topic-aware assessments of subreddit susceptibility to low-credibility information. By combining content-based embeddings with interaction-aware graph representations, the work offers a scalable, privacy-conscious framework for platform-specific credibility analysis with potential applicability beyond Reddit.

Abstract

In the era of social media platforms, identifying the credibility of online content is crucial to combat misinformation. We present the CREDiBERT (CREDibility assessment using Bi-directional Encoder Representations from Transformers), a source credibility assessment model fine-tuned for Reddit submissions focusing on political discourse as the main contribution. We adopt a semi-supervised training approach for CREDiBERT, leveraging Reddit's community-based structure. By encoding submission content using CREDiBERT and integrating it into a Siamese neural network, we significantly improve the binary classification of submission credibility, achieving a 9% increase in F1 score compared to existing methods. Additionally, we introduce a new version of the post-to-post network in Reddit that efficiently encodes user interactions to enhance the binary classification task by nearly 8% in F1 score. Finally, we employ CREDiBERT to evaluate the susceptibility of subreddits with respect to different topics.
Paper Structure (23 sections, 10 equations, 4 figures, 5 tables)

This paper contains 23 sections, 10 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Monthly portion of verified and unverified sources in Reddit. Unverified sources are predominant across subreddits.
  • Figure 2: Fine-grained Siamese network architecture for credit assessment when an anchor submission is accessible.
  • Figure 3: The post-to-post network for $7,466$ submission. The network shows a strong separation between different communities. For brevity, only edges with a weight over $0.3$ are shown.
  • Figure 4: The exposure (blue) and reaction (red) score of 6 topics in r/Conservative, r/Republican, r/Libertarian, and r/politics. Among all subreddits, r/politics has the highest exposure score, while r/Conservative and r/Republican have the lowest exposure score. While r/Libertarian shows extreme susceptibility to certain topics for others, it has identical exposure and reaction scores.