VeriCite: Towards Reliable Citations in Retrieval-Augmented Generation via Rigorous Verification
Haosheng Qian, Yixing Fan, Jiafeng Guo, Ruqing Zhang, Qi Chen, Dawei Yin, Xueqi Cheng
TL;DR
VeriCite addresses the hallucination problem in Retrieval-Augmented Generation by introducing a three-stage framework that rigorously verifies evidence and pre-attributes citations. The pipeline comprises initial answer generation with NLI-based verification, targeted evidence selection with cross-passage entailment checks, and final answer refinement that synthesizes verified statements with grounded citations. Across five open-source LLMs and four datasets, VeriCite consistently improves citation quality while maintaining competitive answer accuracy, though performance on non-fact-based questions and certain multi-hop tasks varies by model. The work highlights the importance of a dedicated NLI verifier for reliable attribution and provides ablation evidence supporting the contribution of each component.
Abstract
Retrieval-Augmented Generation (RAG) has emerged as a crucial approach for enhancing the responses of large language models (LLMs) with external knowledge sources. Despite the impressive performance in complex question-answering tasks, RAG still struggles with hallucinations. Attributing RAG-generated content through in-line citations has demonstrated potential in reducing hallucinations and facilitating human verification. Existing citation generation methods primarily rely on either fine-tuning the generator or employing post-processing approaches for citation matching. However, the former approach demands substantial annotated data and computational resources, while the latter often encounters difficulties in managing multiple citations and frequently produces suboptimal results. In this paper, we introduce a novel framework, called VeriCite, designed to rigorously validate supporting evidence and enhance answer attribution. Specifically, VeriCite breaks down into a three-stage generation: 1) The initial answer generation first generates a response based on all available contexts and has its claims verified through the NLI model; 2) the supporting evidence selection assesses the utility of each document and extracts useful supporting evidences; 3) the final answer refinement integrates the initial response and collected evidences to produce the final, refined answer.We conduct experiments across five open-source LLMs and four datasets, demonstrating that VeriCite can significantly improve citation quality while maintaining the correctness of the answers.
