RAGAR, Your Falsehood Radar: RAG-Augmented Reasoning for Political Fact-Checking using Multimodal Large Language Models

M. Abdul Khaliq; P. Chang; M. Ma; B. Pflugfelder; F. Miletić

RAGAR, Your Falsehood Radar: RAG-Augmented Reasoning for Political Fact-Checking using Multimodal Large Language Models

M. Abdul Khaliq, P. Chang, M. Ma, B. Pflugfelder, F. Miletić

TL;DR

This work tackles political misinformation, including multimodal claims, by introducing RAG-Augmented Reasoning (RAGAR) with two novel techniques: Chain of RAG (CoRAG) and Tree of RAG (ToRAG). The authors build a four-stage multimodal fact-checking pipeline that verbalizes claims with image context, retrieves multimodal evidence, and reasons over evidence using sequential (CoRAG) or branching (ToRAG) strategies, followed by veracity prediction and explanations. Evaluated on a PolitiFact-derived subset of the MOCHEG dataset, ToRAG with CoTVP+CoVe achieves a weighted F1 of $0.85$, outperforming baselines, and human annotations confirm high coverage of gold-standard explanations. The study demonstrates that incorporating multimodal evidence and structured RAG-based reasoning improves both veracity accuracy and explanation quality, while also highlighting limitations in dataset scope, retrieval determinism, and ethical deployment considerations.

Abstract

The escalating challenge of misinformation, particularly in political discourse, requires advanced fact-checking solutions; this is even clearer in the more complex scenario of multimodal claims. We tackle this issue using a multimodal large language model in conjunction with retrieval-augmented generation (RAG), and introduce two novel reasoning techniques: Chain of RAG (CoRAG) and Tree of RAG (ToRAG). They fact-check multimodal claims by extracting both textual and image content, retrieving external information, and reasoning subsequent questions to be answered based on prior evidence. We achieve a weighted F1-score of 0.85, surpassing a baseline reasoning technique by 0.14 points. Human evaluation confirms that the vast majority of our generated fact-check explanations contain all information from gold standard data.

RAGAR, Your Falsehood Radar: RAG-Augmented Reasoning for Political Fact-Checking using Multimodal Large Language Models

TL;DR

, outperforming baselines, and human annotations confirm high coverage of gold-standard explanations. The study demonstrates that incorporating multimodal evidence and structured RAG-based reasoning improves both veracity accuracy and explanation quality, while also highlighting limitations in dataset scope, retrieval determinism, and ethical deployment considerations.

Abstract

Paper Structure (30 sections, 14 figures, 1 table)

This paper contains 30 sections, 14 figures, 1 table.

Introduction
Related Work
Retrieval-Augmented Generation (RAG) for Fact-Checking
Multimodal Fact-Checking using LLMs
Dataset
Multimodal Fact-Checking Pipeline
Multimodal Claim Generation
Multimodal Evidence Retrieval
LLM-Based and RAG-Augmented Reasoning for Fact-Checking
Baseline: Sub-questions with Chain of Thought at Veracity Prediction (SubQ+CoTVP)
RAG-Augmented Reasoning: Chain of RAG (CoRAG)
RAG-Augmented Reasoning: Tree of RAG (ToRAG)
Veracity Prediction and Explanation
Evaluation and Results
Correctness of Veracity Predictions
...and 15 more sections

Figures (14)

Figure 1: An overview of the fact-checking pipeline contrasting the baseline Sub-Question Generation approach from the Chain of RAG and Tree of RAG approach followed by veracity prediction and explanation.
Figure 2: A detailed overview of the Multimodal Fact-checking pipeline
Figure 3: Chain of RAG and Tree of RAG pipeline
Figure 4: Number of 1/2/3 ratings received for explanations by each approach
Figure 5: Annotation Instructions
...and 9 more figures

RAGAR, Your Falsehood Radar: RAG-Augmented Reasoning for Political Fact-Checking using Multimodal Large Language Models

TL;DR

Abstract

RAGAR, Your Falsehood Radar: RAG-Augmented Reasoning for Political Fact-Checking using Multimodal Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (14)