Resolving Conflicting Evidence in Automated Fact-Checking: A Study on Retrieval-Augmented LLMs
Ziyu Ge, Yuhao Wu, Daniel Wai Kit Chin, Roy Ka-Wei Lee, Rui Cao
TL;DR
This work tackles the challenge of automated fact-checking when retrieved evidence is conflicting and originates from sources with varying credibility. It introduces CONFACT, a dataset pairing claims with conflicting evidence and source credibility annotations, and conducts a comprehensive evaluation of retrieval-augmented LLMs (RAG) to reveal vulnerabilities in conflict scenarios. The authors propose credibility-aware strategies that integrate media background information into retrieval and generation, finding that incorporating source backgrounds at the answer-generation stage—especially with Chain-of-Thought prompting or explicit filtering—substantially improves verification performance. They also compare expert-verified (GT-MB) versus automated (Hybrid-MB) credibility signals, noting that expert annotations yield more reliable gains, while automated methods offer scalability with remaining challenges. The study highlights practical implications for AI-assisted fact-checking, emphasizing rigorous source validation and the value of human oversight to ensure trustworthy verification in journalism, policy, and public discourse.
Abstract
Large Language Models (LLMs) augmented with retrieval mechanisms have demonstrated significant potential in fact-checking tasks by integrating external knowledge. However, their reliability decreases when confronted with conflicting evidence from sources of varying credibility. This paper presents the first systematic evaluation of Retrieval-Augmented Generation (RAG) models for fact-checking in the presence of conflicting evidence. To support this study, we introduce \textbf{CONFACT} (\textbf{Con}flicting Evidence for \textbf{Fact}-Checking) (Dataset available at https://github.com/zoeyyes/CONFACT), a novel dataset comprising questions paired with conflicting information from various sources. Extensive experiments reveal critical vulnerabilities in state-of-the-art RAG methods, particularly in resolving conflicts stemming from differences in media source credibility. To address these challenges, we investigate strategies to integrate media background information into both the retrieval and generation stages. Our results show that effectively incorporating source credibility significantly enhances the ability of RAG models to resolve conflicting evidence and improve fact-checking performance.
