Table of Contents
Fetching ...

Fact-Checking Complex Claims with Program-Guided Reasoning

Liangming Pan, Xiaobao Wu, Xinyuan Lu, Anh Tuan Luu, William Yang Wang, Min-Yen Kan, Preslav Nakov

TL;DR

ProgramFC introduces a few-shot neuro-symbolic framework for fact-checking that decomposes complex claims into executable reasoning programs. A Codex-based program generator creates a sequence of sub-task calls, which are then executed by specialized functions (Question, Verify, Predict) to produce a veracity label and explanations. The approach delivers explainability and data efficiency, achieving state-of-the-art results on multi-hop datasets (HOVER, FEVEROUS) with strong performance gains for deeper reasoning and improved evidence retrieval through iterative retrieval. The work also analyzes interpretability, error modes, and limitations, such as computational cost, while outlining future directions for broader applicability and multi-modal reasoning.

Abstract

Fact-checking real-world claims often requires collecting multiple pieces of evidence and applying complex multi-step reasoning. In this paper, we present Program-Guided Fact-Checking (ProgramFC), a novel fact-checking model that decomposes complex claims into simpler sub-tasks that can be solved using a shared library of specialized functions. We first leverage the in-context learning ability of large language models to generate reasoning programs to guide the verification process. Afterward, we execute the program by delegating each sub-task to the corresponding sub-task handler. This process makes our model both explanatory and data-efficient, providing clear explanations of its reasoning process and requiring minimal training data. We evaluate ProgramFC on two challenging fact-checking datasets and show that it outperforms seven fact-checking baselines across different settings of evidence availability, with explicit output programs that benefit human debugging. Our codes and data are publicly available at https://github.com/mbzuai-nlp/ProgramFC.

Fact-Checking Complex Claims with Program-Guided Reasoning

TL;DR

ProgramFC introduces a few-shot neuro-symbolic framework for fact-checking that decomposes complex claims into executable reasoning programs. A Codex-based program generator creates a sequence of sub-task calls, which are then executed by specialized functions (Question, Verify, Predict) to produce a veracity label and explanations. The approach delivers explainability and data efficiency, achieving state-of-the-art results on multi-hop datasets (HOVER, FEVEROUS) with strong performance gains for deeper reasoning and improved evidence retrieval through iterative retrieval. The work also analyzes interpretability, error modes, and limitations, such as computational cost, while outlining future directions for broader applicability and multi-modal reasoning.

Abstract

Fact-checking real-world claims often requires collecting multiple pieces of evidence and applying complex multi-step reasoning. In this paper, we present Program-Guided Fact-Checking (ProgramFC), a novel fact-checking model that decomposes complex claims into simpler sub-tasks that can be solved using a shared library of specialized functions. We first leverage the in-context learning ability of large language models to generate reasoning programs to guide the verification process. Afterward, we execute the program by delegating each sub-task to the corresponding sub-task handler. This process makes our model both explanatory and data-efficient, providing clear explanations of its reasoning process and requiring minimal training data. We evaluate ProgramFC on two challenging fact-checking datasets and show that it outperforms seven fact-checking baselines across different settings of evidence availability, with explicit output programs that benefit human debugging. Our codes and data are publicly available at https://github.com/mbzuai-nlp/ProgramFC.
Paper Structure (38 sections, 8 figures, 3 tables)

This paper contains 38 sections, 8 figures, 3 tables.

Figures (8)

  • Figure 1: Overview of our ProgramFC model, which consists of two modules: (i) Program Generation generates a reasoning program for the input claim using Codex with in-context learning, and then (ii) Program Execution sequentially interprets the program by delegating each step to the corresponding sub-task function.
  • Figure 2: The Codex prompt template used to generate reasoning programs, consisting of a task instruction, in-context examples, and a prompt for the <input_claim>. The full templates are given in Appendix \ref{['appendix:programs']}.
  • Figure 3: Implementation of the question-answering sub-task function for three different settings.
  • Figure 4: F1 score for fact-checking with gold evidence using FLAN-T5 (blue line) and ProgramFC (green line) for language models of increasing sizes: FLAN-T5-small (80M), FLAN-T5-base (250M), FLAN-large (780M), FLAN-T5-XL (3B), and FLAN-T5-XXL (11B) on HOVER 2-hop (left), 3-hop (middle), and 4-hop (right).
  • Figure 5: Retrieval recall@10 for the one-step retrieval and the iterative retrieval in ProgramFC.
  • ...and 3 more figures