Table of Contents
Fetching ...

QuestGen: Effectiveness of Question Generation Methods for Fact-Checking Applications

Ritvik Setty, Vinay Setty

TL;DR

Complex claims in fact-checking are hard to verify; the authors automate question generation to decompose claims and guide evidence gathering. They show that small, fine-tuned models with data augmentation can match or exceed large LLMs in question generation and improve downstream verification compared to human-written questions; a large-scale benchmark with five datasets (including synthetic data) demonstrates this across automated and manual evaluations. The paper provides code and data and demonstrates practical impact for scalable fact-checking using question-driven retrieval and NLI. Overall, the approach offers a scalable, evidence-guided path to enhance automated fact-checking systems.

Abstract

Verifying fact-checking claims poses a significant challenge, even for humans. Recent approaches have demonstrated that decomposing claims into relevant questions to gather evidence enhances the efficiency of the fact-checking process. In this paper, we provide empirical evidence showing that this question decomposition can be effectively automated. We demonstrate that smaller generative models, fine-tuned for the question generation task using data augmentation from various datasets, outperform large language models by up to 8%. Surprisingly, in some cases, the evidence retrieved using machine-generated questions proves to be significantly more effective for fact-checking than that obtained from human-written questions. We also perform manual evaluation of the decomposed questions to assess the quality of the questions generated.

QuestGen: Effectiveness of Question Generation Methods for Fact-Checking Applications

TL;DR

Complex claims in fact-checking are hard to verify; the authors automate question generation to decompose claims and guide evidence gathering. They show that small, fine-tuned models with data augmentation can match or exceed large LLMs in question generation and improve downstream verification compared to human-written questions; a large-scale benchmark with five datasets (including synthetic data) demonstrates this across automated and manual evaluations. The paper provides code and data and demonstrates practical impact for scalable fact-checking using question-driven retrieval and NLI. Overall, the approach offers a scalable, evidence-guided path to enhance automated fact-checking systems.

Abstract

Verifying fact-checking claims poses a significant challenge, even for humans. Recent approaches have demonstrated that decomposing claims into relevant questions to gather evidence enhances the efficiency of the fact-checking process. In this paper, we provide empirical evidence showing that this question decomposition can be effectively automated. We demonstrate that smaller generative models, fine-tuned for the question generation task using data augmentation from various datasets, outperform large language models by up to 8%. Surprisingly, in some cases, the evidence retrieved using machine-generated questions proves to be significantly more effective for fact-checking than that obtained from human-written questions. We also perform manual evaluation of the decomposed questions to assess the quality of the questions generated.
Paper Structure (18 sections, 1 figure, 4 tables)

This paper contains 18 sections, 1 figure, 4 tables.

Figures (1)

  • Figure 1: Prompt, claim, and questions generated using Mistral.