Table of Contents
Fetching ...

CoRect: Context-Aware Logit Contrast for Hidden State Rectification to Resolve Knowledge Conflicts

Xuhua Ma, Richong Zhang, Zhijie Nie

TL;DR

Retrieval-augmented generation can hallucinate when retrieved evidence conflicts with internal priors. The paper identifies Parametric Suppression in deep FFNs and introduces CoRect, a training-free, target-agnostic hidden-state rectification that localizes conflict-inducing layers and neutralizes their bias via stage-wise interventions grounded in the input context. A two-stage pipeline selects a trustworthy target token using Contextual Mutual Information and an attention filter, then rectifies suppressed hidden states by aligning perturbations with the unembedding vector to ensure a positive logit shift $Delta z_L(tilde{t}^*)$. Across QA and summarization benchmarks, CoRect improves faithfulness and reduces hallucinations relative to decoding-time baselines and FFN-editing methods, with robust gains on high-conflict data such as NQ-Swap.

Abstract

Retrieval-Augmented Generation (RAG) often struggles with knowledge conflicts, where model-internal parametric knowledge overrides retrieved evidence, leading to unfaithful outputs. Existing approaches are often limited, relying either on superficial decoding adjustments or weight editing that necessitates ground-truth targets. Through layer-wise analysis, we attribute this failure to a parametric suppression phenomenon: specifically, in deep layers, certain FFN layers overwrite context-sensitive representations with memorized priors. To address this, we propose CoRect (Context-Aware Logit Contrast for Hidden State Rectification). By contrasting logits from contextualized and non-contextualized forward passes, CoRect identifies layers that exhibit high parametric bias without requiring ground-truth labels. It then rectifies the hidden states to preserve evidence-grounded information. Across question answering (QA) and summarization benchmarks, CoRect consistently improves faithfulness and reduces hallucinations compared to strong baselines.

CoRect: Context-Aware Logit Contrast for Hidden State Rectification to Resolve Knowledge Conflicts

TL;DR

Retrieval-augmented generation can hallucinate when retrieved evidence conflicts with internal priors. The paper identifies Parametric Suppression in deep FFNs and introduces CoRect, a training-free, target-agnostic hidden-state rectification that localizes conflict-inducing layers and neutralizes their bias via stage-wise interventions grounded in the input context. A two-stage pipeline selects a trustworthy target token using Contextual Mutual Information and an attention filter, then rectifies suppressed hidden states by aligning perturbations with the unembedding vector to ensure a positive logit shift . Across QA and summarization benchmarks, CoRect improves faithfulness and reduces hallucinations relative to decoding-time baselines and FFN-editing methods, with robust gains on high-conflict data such as NQ-Swap.

Abstract

Retrieval-Augmented Generation (RAG) often struggles with knowledge conflicts, where model-internal parametric knowledge overrides retrieved evidence, leading to unfaithful outputs. Existing approaches are often limited, relying either on superficial decoding adjustments or weight editing that necessitates ground-truth targets. Through layer-wise analysis, we attribute this failure to a parametric suppression phenomenon: specifically, in deep layers, certain FFN layers overwrite context-sensitive representations with memorized priors. To address this, we propose CoRect (Context-Aware Logit Contrast for Hidden State Rectification). By contrasting logits from contextualized and non-contextualized forward passes, CoRect identifies layers that exhibit high parametric bias without requiring ground-truth labels. It then rectifies the hidden states to preserve evidence-grounded information. Across question answering (QA) and summarization benchmarks, CoRect consistently improves faithfulness and reduces hallucinations compared to strong baselines.
Paper Structure (53 sections, 32 equations, 9 figures, 2 tables, 1 algorithm)

This paper contains 53 sections, 32 equations, 9 figures, 2 tables, 1 algorithm.

Figures (9)

  • Figure 1: Comparison of intervention strategies. (a) Our method localizes and rectifies parametric suppression layers within the model's internal residual stream. (b) Baseline methods treat the model as a black box by intervening only at the output stage.
  • Figure 2: Rank evolution analysis. (Top) Middle layer flip pattern. (Bottom) Last layer flip pattern.
  • Figure 3: The overall architecture of our proposed model.
  • Figure 4: Performance metrics across different layer aggregation scales. The F1 score (blue) validates the localization accuracy by comparing ROME-identified layers $l^*$ with our defined layers $l$, while the accuracy (red) illustrates the performance stability after applying our correction method.
  • Figure 5: Hyperparameter sensitivity and performance analysis.(a) Accuracy with varying number of layers $K$ on NQ. (b) Effect of layer depth on generation scores on XSUM. (c) Impact of attention weight ($\lambda$) on model accuracy on NQ. (d) Impact of attention weight ($\lambda$) on generation scores on XSUM.
  • ...and 4 more figures