Can LLMs Automate Fact-Checking Article Writing?

Dhruv Sahnan; David Corney; Irene Larraz; Giovanni Zagni; Ruben Miguez; Zhuohan Xie; Iryna Gurevych; Elizabeth Churchill; Tanmoy Chakraborty; Preslav Nakov

Can LLMs Automate Fact-Checking Article Writing?

Dhruv Sahnan, David Corney, Irene Larraz, Giovanni Zagni, Ruben Miguez, Zhuohan Xie, Iryna Gurevych, Elizabeth Churchill, Tanmoy Chakraborty, Preslav Nakov

TL;DR

This work tackles the gap in automatic fact-checking by proposing automatic generation of full fact-checking articles. It introduces QRAFT, a three-agent, two-stage framework that mimics human article writing: Planner gathers evidence, Writer composes, and Editor refines through simulated editorial feedback, with iterative planning and revision. Across automatic metrics and expert judgments, QRAFT outperforms several baselines yet remains below expert-written articles, highlighting ongoing trust and quality gaps for AI-generated journalism. The results motivate further research on integrating domain-specific guidelines, enhancing factual coherence, and ensuring transparent, verifiable citations before AI-generated articles can be used in public communication.

Abstract

Automatic fact-checking aims to support professional fact-checkers by offering tools that can help speed up manual fact-checking. Yet, existing frameworks fail to address the key step of producing output suitable for broader dissemination to the general public: while human fact-checkers communicate their findings through fact-checking articles, automated systems typically produce little or no justification for their assessments. Here, we aim to bridge this gap. We argue for the need to extend the typical automatic fact-checking pipeline with automatic generation of full fact-checking articles. We first identify key desiderata for such articles through a series of interviews with experts from leading fact-checking organizations. We then develop QRAFT, an LLM-based agentic framework that mimics the writing workflow of human fact-checkers. Finally, we assess the practical usefulness of QRAFT through human evaluations with professional fact-checkers. Our evaluation shows that while QRAFT outperforms several previously proposed text-generation approaches, it lags considerably behind expert-written articles. We hope that our work will enable further research in this new and important direction.

Can LLMs Automate Fact-Checking Article Writing?

TL;DR

Abstract

Can LLMs Automate Fact-Checking Article Writing?

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)