Ground Every Sentence: Improving Retrieval-Augmented LLMs with Interleaved Reference-Claim Generation
Sirui Xia, Xintao Wang, Jiaqing Liang, Yifei Zhang, Weikang Zhou, Jiaji Deng, Fei Yu, Yanghua Xiao
TL;DR
The paper tackles verifiability and credibility in Retrieval-Augmented Generation by introducing ReClaim, a method that interleaves sentence-level references and claims to produce highly granular attributions in long-form answers. It builds specialized training data from WebGLM-QA and ELI5, and uses constrained decoding with a prefix-tree to ensure references precisely align with generated sentences. Two main variants are proposed: ReClaim_Unified for end-to-end one-step generation and ReClaim w/IG, which trains separate ReferModel and ClaimModel and alternates their outputs during inference. Across ASQA, ELI5, and EXPERTQA, ReClaim improves citation quality and verifiability (high CAS and reduced citation length) while maintaining strong fluency, though there are some trade-offs in overall answer accuracy under certain configurations. The work demonstrates that sentence-level attribution via interleaved generation can meaningfully enhance the credibility and verifiability of RAG-based QA systems, with practical impact for systems requiring verifiable sourcing and efficient fact-checking.
Abstract
Retrieval-Augmented Generation (RAG) has been widely adopted to enhance Large Language Models (LLMs) in knowledge-intensive tasks. To enhance credibility and verifiability in RAG systems, Attributed Text Generation (ATG) is proposed, which provides citations to retrieval knowledge in LLM-generated responses. Prior methods mainly adopt coarse-grained attributions, with passage-level or paragraph-level references or citations, which fall short in verifiability. This paper proposes ReClaim (Refer & Claim), a fine-grained ATG method that alternates the generation of references and answers step by step. Different from previous coarse-grained attribution, ReClaim provides sentence-level citations in long-form question-answering tasks. With extensive experiments, we verify the effectiveness of ReClaim in extensive settings, achieving a citation accuracy rate of 90%.
