EviNote-RAG: Enhancing RAG Models via Answer-Supportive Evidence Notes
Yuqin Dai, Guoqing Wang, Yuan Wang, Kairan Dou, Kaichen Zhou, Zhanwei Zhang, Shuo Yang, Fei Tang, Jun Yin, Pengyu Zeng, Zhenzhe Ying, Can Yi, Changhua Meng, Yuchen Zhou, Yongliang Shen, Shuai Lu
TL;DR
This work tackles the challenge of noisy external evidence and error propagation in Retrieval-Augmented Generation by introducing EviNote-RAG, which restructures the retrieval loop into a retrieve–note–answer pipeline. It generates Supportive-Evidence Notes (SENs) that distill answer-relevant information and annotate key and uncertain content, and it employs an entailment-based Evidence Quality Reward (EQR) to ensure SENs can logically derive the final answer. Through end-to-end reinforcement learning with GRPO, the framework achieves state-of-the-art results on both in-domain and out-of-domain QA benchmarks, while improving training stability and efficiency. The approach provides a principled recipe for integrating structured note-taking with reward design to produce more interpretable, faithful, and robust RAG systems.
Abstract
Retrieval-Augmented Generation (RAG) has advanced open-domain question answering by incorporating external information into model reasoning. However, effectively leveraging external information to enhance reasoning presents the following challenges: (1) low signal-to-noise ratio, where answer-supportive external information is diluted by irrelevant material, and (2) error accumulation, which arises in multi-hop reasoning when incomplete or misleading information is incorporated. To address these challenges, we introduce EviNote-RAG, a framework that follows a retrieve-note-answer workflow. Instead of reasoning directly over raw external information, the model first produces Supportive-Evidence Notes (SENs), which concisely preserve answer-critical information and explicitly mark key and uncertainty information to improve accuracy. We further design an entailment-based Evidence Quality Reward (EQR) to ensure that SENs are logically sufficient to derive the final answer, thereby enhancing SENs' quality. Experiments on both in-domain and out-of-domain QA benchmarks show that EviNote-RAG achieves state-of-the-art performance, improving answer accuracy, training stability, robustness, and efficiency. In particular, it yields relative F1 gains of 20% on HotpotQA (+0.093), 40% on Bamboogle (+0.151), and 91% on 2Wiki (+0.256), benefiting from improvements in the reasoning process.
