Agent-based Automated Claim Matching with Instruction-following LLMs
Dina Pisarevskaya, Arkaitz Zubiaga
TL;DR
The paper tackles automated claim matching (CM) by introducing a two-step, agent-based pipeline where one LLM generates task-specific prompts and a second LLM performs binary classification to decide if two claims match. By systematically exploring few-shot prompt selection and cross-model prompt generation, the authors demonstrate that LLM-generated prompts can exceed SOTA results obtained with human-crafted prompts, and that smaller LLMs can effectively generate prompts, saving compute. The work reveals how different LLMs contribute in different steps and which prompts best reveal CM semantics, including the insight that framing CM as identifying the same event/topic/idea yields strong results while leaving room for improvement with additional markers. Overall, the approach advances automated prompt engineering for CM, reduces reliance on large upfront human prompts, and provides practical guidance for resource-efficient, cross-LLM CM systems.
Abstract
We present a novel agent-based approach for the automated claim matching task with instruction-following LLMs. We propose a two-step pipeline that first generates prompts with LLMs, to then perform claim matching as a binary classification task with LLMs. We demonstrate that LLM-generated prompts can outperform SOTA with human-generated prompts, and that smaller LLMs can do as well as larger ones in the generation process, allowing to save computational resources. We also demonstrate the effectiveness of using different LLMs for each step of the pipeline, i.e. using an LLM for prompt generation, and another for claim matching. Our investigation into the prompt generation process in turn reveals insights into the LLMs' understanding of claim matching.
