Table of Contents
Fetching ...

Attributing Response to Context: A Jensen-Shannon Divergence Driven Mechanistic Study of Context Attribution in Retrieval-Augmented Generation

Ruizhe Li, Chen Chen, Yuchen Hu, Yanjun Gao, Xi Wang, Emine Yilmaz

TL;DR

The paper tackles the challenge of attributing RAG-generated content to retrieved context by introducing ARC-JSD, an inference-time Jensen-Shannon Divergence based method that identifies the most influential context sentences without fine-tuning or surrogate models. It demonstrates that ARC-JSD achieves higher context-attribution accuracy and up to threefold speedups across TyDi QA, Hotpot QA, and MuSiQue benchmarks, compared to prior baselines. Beyond attribution accuracy, the authors perform a mechanistic analysis by coupling ARC-JSD with Logit Lens to locate specific attention heads and MLP layers driving attribution, revealing consistent higher-layer involvement and enabling gating strategies to mitigate hallucinations. The work further validates its findings through semantic gains and consensus fusion, provides qualitative case studies and visualisations, and supplies a public codebase, offering a practical, interpretable framework for auditing grounding in RAG systems.

Abstract

Retrieval-Augmented Generation (RAG) leverages large language models (LLMs) combined with external contexts to enhance the accuracy and reliability of generated responses. However, reliably attributing generated content to specific context segments, context attribution, remains challenging due to the computationally intensive nature of current methods, which often require extensive fine-tuning or human annotation. In this work, we introduce a novel Jensen-Shannon Divergence driven method to Attribute Response to Context (ARC-JSD), enabling efficient and accurate identification of essential context sentences without additional fine-tuning, gradient-calculation or surrogate modelling. Evaluations on a wide range of RAG benchmarks, such as TyDi QA, Hotpot QA, and Musique, using instruction-tuned LLMs in different scales demonstrate superior accuracy and significant computational efficiency improvements compared to the previous surrogate-based method. Furthermore, our mechanistic analysis reveals specific attention heads and multilayer perceptron (MLP) layers responsible for context attribution, providing valuable insights into the internal workings of RAG models and how they affect RAG behaviours. Our code is available at https://github.com/ruizheliUOA/ARC_JSD.

Attributing Response to Context: A Jensen-Shannon Divergence Driven Mechanistic Study of Context Attribution in Retrieval-Augmented Generation

TL;DR

The paper tackles the challenge of attributing RAG-generated content to retrieved context by introducing ARC-JSD, an inference-time Jensen-Shannon Divergence based method that identifies the most influential context sentences without fine-tuning or surrogate models. It demonstrates that ARC-JSD achieves higher context-attribution accuracy and up to threefold speedups across TyDi QA, Hotpot QA, and MuSiQue benchmarks, compared to prior baselines. Beyond attribution accuracy, the authors perform a mechanistic analysis by coupling ARC-JSD with Logit Lens to locate specific attention heads and MLP layers driving attribution, revealing consistent higher-layer involvement and enabling gating strategies to mitigate hallucinations. The work further validates its findings through semantic gains and consensus fusion, provides qualitative case studies and visualisations, and supplies a public codebase, offering a practical, interpretable framework for auditing grounding in RAG systems.

Abstract

Retrieval-Augmented Generation (RAG) leverages large language models (LLMs) combined with external contexts to enhance the accuracy and reliability of generated responses. However, reliably attributing generated content to specific context segments, context attribution, remains challenging due to the computationally intensive nature of current methods, which often require extensive fine-tuning or human annotation. In this work, we introduce a novel Jensen-Shannon Divergence driven method to Attribute Response to Context (ARC-JSD), enabling efficient and accurate identification of essential context sentences without additional fine-tuning, gradient-calculation or surrogate modelling. Evaluations on a wide range of RAG benchmarks, such as TyDi QA, Hotpot QA, and Musique, using instruction-tuned LLMs in different scales demonstrate superior accuracy and significant computational efficiency improvements compared to the previous surrogate-based method. Furthermore, our mechanistic analysis reveals specific attention heads and multilayer perceptron (MLP) layers responsible for context attribution, providing valuable insights into the internal workings of RAG models and how they affect RAG behaviours. Our code is available at https://github.com/ruizheliUOA/ARC_JSD.

Paper Structure

This paper contains 29 sections, 13 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: This framework demonstrates how our ARC-JSD works: (a) a RAG LLM $\mathcal{P}_{\text{LM}}(\cdot)$ first generates response $\mathcal{R}$ conditioned on full context $\mathcal{C}$ and query $\mathcal{Q}$ input; (b) By ablating single context sentence once a time, we can calculate probability distribution of the same response $\mathcal{R}$ conditioned on the ablated context $\mathcal{C}_{\text{ABLATE}}(c_i)$ and query $\mathcal{Q}$; (c) We further calculate JSD scores about probability distribution of the same response $\mathcal{R}$ conditioned on full context and ablated context, and locate the most relevant context sentence supporting $\mathcal{R}$ with the highest JSD score. Then, we apply JSD-based metric to internal components of RAGs: (d) For each attention head or MLP output at each layer, we calculate probability distribution of the same response $\mathcal{R}$ conditioned on the same query $\mathcal{Q}$ with full context $\mathcal{C}$ and ablated context $\mathcal{C}_{\text{ABLATE}}(c_{\text{top-1}})$ by removing top relevant context sentence based on § \ref{['sec:ARC+JSD']}; (e) We can further locate top-$N$ relevant attention heads or MLPs which contribute the context attribution by ranking the collected JSD scores with a descending order.
  • Figure 2: (a) The compute-accuracy trade-off on MuSiQue for 4 baselines and ARC-JSD on 4 LLM backbones with GFLOPs $\mathrm{log}_{10}$ scale per sample; (b) The average JSD score of attention heads and MLP of Qwen2-1.5B-IT on TyDi QA across all layers. The deeper colour indicates larger JSD scores.
  • Figure 3: The projection of $\mathbf{x}_i^{\ell,\text{mid}}$ and $\mathbf{x}_i^{\ell,\text{post}}$ via Logit Lens to vocabulary space from layer 20 to layer 27 of Qwen2-1.5B IT in TyDi QA data sample, where the generated response $\mathcal{R}$ is "A mosquito has two wings." (See Appendix \ref{['app:case_studies']} for all layer projections). Each cell shows the most probable token decoded via Logit Lens. The colour indicates the probability of the decoded token of the corresponding $\mathbf{x}_i^{\ell,\text{mid}}$ or $\mathbf{x}_i^{\ell,\text{post}}$.
  • Figure 4: The compute-accuracy trade-off on MuSiQue for different metrics and ARC-JSD on 4 LLM backbones with GFLOPs $\mathrm{log}_{10}$ scale per sample.