Zero-shot and Few-shot Learning with Instruction-following LLMs for Claim Matching in Automated Fact-checking
Dina Pisarevskaya, Arkaitz Zubiaga
TL;DR
The paper addresses claim matching (CM) in automated fact-checking under data scarcity by studying zero-shot and few-shot learning with instruction-following LLMs. It reframes CM as PD or NLI tasks via carefully engineered prompts and evaluates four models across short and long texts, introducing the ClaimMatch dataset. Key findings show that PD and NLI prompts enable strong CM performance in few-shot regimes (e.g., Mistral with PD-6 reaching ~95% F1 and Gemini with NLI-5+PD-6 reaching ~97.2% F1), sometimes matching or approaching fine-tuned baselines, and that ensemble prompts can improve robustness. The work provides a practical CM pipeline, analyzes error modes, and proposes directions for dataset expansion and automated prompt search to advance data-efficient CM in real-world fact-checking.
Abstract
The claim matching (CM) task can benefit an automated fact-checking pipeline by putting together claims that can be resolved with the same fact-check. In this work, we are the first to explore zero-shot and few-shot learning approaches to the task. We consider CM as a binary classification task and experiment with a set of instruction-following large language models (GPT-3.5-turbo, Gemini-1.5-flash, Mistral-7B-Instruct, and Llama-3-8B-Instruct), investigating prompt templates. We introduce a new CM dataset, ClaimMatch, which will be released upon acceptance. We put LLMs to the test in the CM task and find that it can be tackled by leveraging more mature yet similar tasks such as natural language inference or paraphrase detection. We also propose a pipeline for CM, which we evaluate on texts of different lengths.
