Table of Contents
Fetching ...

CrediRAG: Network-Augmented Credibility-Based Retrieval for Misinformation Detection in Reddit

Ashwin Ram, Yigit Ege Bayiz, Arash Amini, Mustafa Munir, Radu Marculescu

TL;DR

CrediRAG is presented, the first fake news detection model that combines language models with access to a rich external political knowledge base with a dense social network to detect fake news across social media at scale.

Abstract

Fake news threatens democracy and exacerbates the polarization and divisions in society; therefore, accurately detecting online misinformation is the foundation of addressing this issue. We present CrediRAG, the first fake news detection model that combines language models with access to a rich external political knowledge base with a dense social network to detect fake news across social media at scale. CrediRAG uses a news retriever to initially assign a misinformation score to each post based on the source credibility of similar news articles to the post title content. CrediRAG then improves the initial retrieval estimations through a novel weighted post-to-post network connected based on shared commenters and weighted by the average stance of all shared commenters across every pair of posts. We achieve 11% increase in the F1-score in detecting misinformative posts over state-of-the-art methods. Extensive experiments conducted on curated real-world Reddit data of over 200,000 posts demonstrate the superior performance of CrediRAG on existing baselines. Thus, our approach offers a more accurate and scalable solution to combat the spread of fake news across social media platforms.

CrediRAG: Network-Augmented Credibility-Based Retrieval for Misinformation Detection in Reddit

TL;DR

CrediRAG is presented, the first fake news detection model that combines language models with access to a rich external political knowledge base with a dense social network to detect fake news across social media at scale.

Abstract

Fake news threatens democracy and exacerbates the polarization and divisions in society; therefore, accurately detecting online misinformation is the foundation of addressing this issue. We present CrediRAG, the first fake news detection model that combines language models with access to a rich external political knowledge base with a dense social network to detect fake news across social media at scale. CrediRAG uses a news retriever to initially assign a misinformation score to each post based on the source credibility of similar news articles to the post title content. CrediRAG then improves the initial retrieval estimations through a novel weighted post-to-post network connected based on shared commenters and weighted by the average stance of all shared commenters across every pair of posts. We achieve 11% increase in the F1-score in detecting misinformative posts over state-of-the-art methods. Extensive experiments conducted on curated real-world Reddit data of over 200,000 posts demonstrate the superior performance of CrediRAG on existing baselines. Thus, our approach offers a more accurate and scalable solution to combat the spread of fake news across social media platforms.

Paper Structure

This paper contains 49 sections, 9 equations, 3 figures, 2 tables, 2 algorithms.

Figures (3)

  • Figure 1: Overall Framework of CrediRAG. The social media posts ⓐ are the nodes in the weighted graph. We use Retrieval-augmented generation (RAG) to obtain all related news articles to a given post. The average credibility of all sources of retrieved articles is used to give an initial estimate of the misinformative level of every post in the graph. A corrective graph attention network (GAT) shown in ⓒ is adversarially trained to refine labels based on the post-to-post network ⓑ. This GAT corrects all of the RAG labels to give a final estimate of the binary label for each node.
  • Figure 2: Example of how our post-to-post graph is built. We link posts if they share at least one commenter. The edge weight is determined by taking the product of the stances of each user to the pair of posts and averaging it over all such shared users. This weighting scheme contains non-trivial information that our algorithm CrediRAG leverages for effective detection of misinformation across Reddit.
  • Figure 3: ROC and Calibration Curves for ISOT Reddit and r/Fakeddit Datasets on r/SandersForPresident, r/EnoughTrumpSpam, and r/DonaldTrumpWhiteHouse subreddits. As shown, CrediRAG (blue curve) gets the best results on the ROC curve, while also being well-calibrated.