Resisting Contextual Interference in RAG via Parametric-Knowledge Reinforcement
Chenyu Lin, Yilin Wen, Du Su, Hexiang Tan, Fei Sun, Muhan Chen, Chenfu Bao, Zhonghou Lyu
TL;DR
Knowledgeable-R1 addresses the susceptibility of retrieval-augmented generation to contextual interference by learning when to rely on parametric knowledge versus retrieved context. It introduces three decoding policies (PK, CK, RPK) and a joint reinforcement-learning objective with local and global advantages, plus an adaptive knowledge-balance modulation to keep PK as a viable fallback. Empirical results across five context scenarios show strong robustness to adversarial, conflicting, and irrelevant context, with substantial gains over RAG-based baselines and consistent performance in correct-context settings. The approach offers a principled, scalable framework for balancing internal knowledge with external signals in knowledge-intensive tasks, with practical implications for deploying robust RAG systems.
Abstract
Retrieval-augmented generation (RAG) improves performance on knowledge-intensive tasks but can be derailed by wrong, irrelevant, or conflicting retrieved text, causing models to rely on inaccurate evidence and cascade errors. We propose Knowledgeable-R1, a reinforcement-learning framework that explicitly trains large language models to use parametric knowledge (PK) to resist contextual interference while still exploiting external context when it is reliably helpful. Knowledgeable-R1 introduces a joint sampling scheme that generates paired responses with and without retrieval, and learns both local advantages (within each decoding regime) and global advantages under the same input to quantify when to ignore misleading context versus adopt it. We employ an asymmetric advantage transformation that amplifies exploratory behaviors toward parametric knowledge. Experiments show that \method significantly improves robustness and reasoning accuracy in knowledge conflict scenarios and general RAG scenarios, outperforming SOTA baselines by 23% in counterfactual scenarios, and without degradation when the retrieved context is fully accurate.Our code are available at https://github.com/lcy80366872/knowledgeable-R1.
