When retrieval outperforms generation: Dense evidence retrieval for scalable fake news detection
Alamgir Munir Qazi, John P. McCrae, Jamal Abdul Nasir
TL;DR
The paper addresses the need for scalable, reliable fact verification without relying on costly LLM-generated explanations. It introduces DeReC, a three-stage pipeline that uses dense sentence embeddings, FAISS-based retrieval, and a fine-tuned DeBERTa-v3-large classifier to ground claims in external evidence. On RAWFC and LIAR-RAW, DeReC achieves state-of-the-art or closely competitive F1 scores while delivering substantial runtime savings (approximately $95\%$ on RAWFC and $\sim$92\% on LIAR-RAW) compared with LLM-based baselines. These results demonstrate that carefully engineered dense retrieval with targeted classification can match or surpass LLM explanations in veracity tasks, while enabling practical deployment on commodity hardware.
Abstract
The proliferation of misinformation necessitates robust yet computationally efficient fact verification systems. While current state-of-the-art approaches leverage Large Language Models (LLMs) for generating explanatory rationales, these methods face significant computational barriers and hallucination risks in real-world deployments. We present DeReC (Dense Retrieval Classification), a lightweight framework that demonstrates how general-purpose text embeddings can effectively replace autoregressive LLM-based approaches in fact verification tasks. By combining dense retrieval with specialized classification, our system achieves better accuracy while being significantly more efficient. DeReC outperforms explanation-generating LLMs in efficiency, reducing runtime by 95% on RAWFC (23 minutes 36 seconds compared to 454 minutes 12 seconds) and by 92% on LIAR-RAW (134 minutes 14 seconds compared to 1692 minutes 23 seconds), showcasing its effectiveness across varying dataset sizes. On the RAWFC dataset, DeReC achieves an F1 score of 65.58%, surpassing the state-of-the-art method L-Defense (61.20%). Our results demonstrate that carefully engineered retrieval-based systems can match or exceed LLM performance in specialized tasks while being significantly more practical for real-world deployment.
