Claim Verification in the Age of Large Language Models: A Survey
Alphaeus Dmonte, Roland Oruche, Marcos Zampieri, Prasad Calyam, Isabelle Augenstein
TL;DR
The paper addresses the problem of verifying claims in the era of large language models and pervasive online misinformation. It provides a comprehensive survey of LLM-based claim verification frameworks, detailing pipeline components such as retrieval, prompting, transfer learning, and generation, with a focus on retrieval-augmented generation (RAG). It catalogs public English datasets, metrics, and shared tasks, and discusses open challenges including irrelevant context, knowledge conflicts, and multilinguality, offering guidance for future research. The work serves as a foundational guide for researchers and practitioners aiming to build robust, explainable, and scalable fact-checking systems using LLMs.
Abstract
The large and ever-increasing amount of data available on the Internet coupled with the laborious task of manual claim and fact verification has sparked the interest in the development of automated claim verification systems. Several deep learning and transformer-based models have been proposed for this task over the years. With the introduction of Large Language Models (LLMs) and their superior performance in several NLP tasks, we have seen a surge of LLM-based approaches to claim verification along with the use of novel methods such as Retrieval Augmented Generation (RAG). In this survey, we present a comprehensive account of recent claim verification frameworks using LLMs. We describe the different components of the claim verification pipeline used in these frameworks in detail including common approaches to retrieval, prompting, and fine-tuning. Finally, we describe publicly available English datasets created for this task.
