News Source Credibility Assessment: A Reddit Case Study
Arash Amini, Yigit Ege Bayiz, Ashwin Ram, Radu Marculescu, Ufuk Topcu
TL;DR
This paper addresses the challenge of determining credibility of Reddit-originated political content by shifting focus from truth appraisal to source credibility. It introduces CREDiBERT, a semi-supervised, Siamese transformer model trained on millions of paired submissions and paired with a weighted post-to-post network to incorporate social-reaction signals while preserving user privacy. The approach yields notable gains in F1 and overall accuracy over several baselines and enables topic-aware assessments of subreddit susceptibility to low-credibility information. By combining content-based embeddings with interaction-aware graph representations, the work offers a scalable, privacy-conscious framework for platform-specific credibility analysis with potential applicability beyond Reddit.
Abstract
In the era of social media platforms, identifying the credibility of online content is crucial to combat misinformation. We present the CREDiBERT (CREDibility assessment using Bi-directional Encoder Representations from Transformers), a source credibility assessment model fine-tuned for Reddit submissions focusing on political discourse as the main contribution. We adopt a semi-supervised training approach for CREDiBERT, leveraging Reddit's community-based structure. By encoding submission content using CREDiBERT and integrating it into a Siamese neural network, we significantly improve the binary classification of submission credibility, achieving a 9% increase in F1 score compared to existing methods. Additionally, we introduce a new version of the post-to-post network in Reddit that efficiently encodes user interactions to enhance the binary classification task by nearly 8% in F1 score. Finally, we employ CREDiBERT to evaluate the susceptibility of subreddits with respect to different topics.
