Attributing Response to Context: A Jensen-Shannon Divergence Driven Mechanistic Study of Context Attribution in Retrieval-Augmented Generation
Ruizhe Li, Chen Chen, Yuchen Hu, Yanjun Gao, Xi Wang, Emine Yilmaz
TL;DR
The paper tackles the challenge of attributing RAG-generated content to retrieved context by introducing ARC-JSD, an inference-time Jensen-Shannon Divergence based method that identifies the most influential context sentences without fine-tuning or surrogate models. It demonstrates that ARC-JSD achieves higher context-attribution accuracy and up to threefold speedups across TyDi QA, Hotpot QA, and MuSiQue benchmarks, compared to prior baselines. Beyond attribution accuracy, the authors perform a mechanistic analysis by coupling ARC-JSD with Logit Lens to locate specific attention heads and MLP layers driving attribution, revealing consistent higher-layer involvement and enabling gating strategies to mitigate hallucinations. The work further validates its findings through semantic gains and consensus fusion, provides qualitative case studies and visualisations, and supplies a public codebase, offering a practical, interpretable framework for auditing grounding in RAG systems.
Abstract
Retrieval-Augmented Generation (RAG) leverages large language models (LLMs) combined with external contexts to enhance the accuracy and reliability of generated responses. However, reliably attributing generated content to specific context segments, context attribution, remains challenging due to the computationally intensive nature of current methods, which often require extensive fine-tuning or human annotation. In this work, we introduce a novel Jensen-Shannon Divergence driven method to Attribute Response to Context (ARC-JSD), enabling efficient and accurate identification of essential context sentences without additional fine-tuning, gradient-calculation or surrogate modelling. Evaluations on a wide range of RAG benchmarks, such as TyDi QA, Hotpot QA, and Musique, using instruction-tuned LLMs in different scales demonstrate superior accuracy and significant computational efficiency improvements compared to the previous surrogate-based method. Furthermore, our mechanistic analysis reveals specific attention heads and multilayer perceptron (MLP) layers responsible for context attribution, providing valuable insights into the internal workings of RAG models and how they affect RAG behaviours. Our code is available at https://github.com/ruizheliUOA/ARC_JSD.
