FactIR: A Real-World Zero-shot Open-Domain Retrieval Benchmark for Fact-Checking
Venktesh V, Vinay Setty
TL;DR
This work introduces FactIR, a real-world open-domain retrieval benchmark for fact-checking derived from Factiverse production logs with expert annotations, enabling rigorous zero-shot evaluation of retrieval systems. The study comprehensively benchmarks lexical, sparse, dense, and re-ranking approaches, revealing that traditional lexical methods often outperform dense retrievers in real-world settings, while semantic clustering-based training yields strong generalization (e.g., Snowflake-arctic-embed-s). LLM-based re-rankers further improve performance, highlighting the value of cross-domain generalization for open-domain fact-checking. The authors provide practical guidance and release a reusable library to facilitate extending retrievers and re-rankers in real-world fact-checking pipelines.
Abstract
The field of automated fact-checking increasingly depends on retrieving web-based evidence to determine the veracity of claims in real-world scenarios. A significant challenge in this process is not only retrieving relevant information, but also identifying evidence that can both support and refute complex claims. Traditional retrieval methods may return documents that directly address claims or lean toward supporting them, but often struggle with more complex claims requiring indirect reasoning. While some existing benchmarks and methods target retrieval for fact-checking, a comprehensive real-world open-domain benchmark has been lacking. In this paper, we present a real-world retrieval benchmark FactIR, derived from Factiverse production logs, enhanced with human annotations. We rigorously evaluate state-of-the-art retrieval models in a zero-shot setup on FactIR and offer insights for developing practical retrieval systems for fact-checking. Code and data are available at https://github.com/factiverse/factIR.
