CoRect: Context-Aware Logit Contrast for Hidden State Rectification to Resolve Knowledge Conflicts
Xuhua Ma, Richong Zhang, Zhijie Nie
TL;DR
Retrieval-augmented generation can hallucinate when retrieved evidence conflicts with internal priors. The paper identifies Parametric Suppression in deep FFNs and introduces CoRect, a training-free, target-agnostic hidden-state rectification that localizes conflict-inducing layers and neutralizes their bias via stage-wise interventions grounded in the input context. A two-stage pipeline selects a trustworthy target token using Contextual Mutual Information and an attention filter, then rectifies suppressed hidden states by aligning perturbations with the unembedding vector to ensure a positive logit shift $Delta z_L(tilde{t}^*)$. Across QA and summarization benchmarks, CoRect improves faithfulness and reduces hallucinations relative to decoding-time baselines and FFN-editing methods, with robust gains on high-conflict data such as NQ-Swap.
Abstract
Retrieval-Augmented Generation (RAG) often struggles with knowledge conflicts, where model-internal parametric knowledge overrides retrieved evidence, leading to unfaithful outputs. Existing approaches are often limited, relying either on superficial decoding adjustments or weight editing that necessitates ground-truth targets. Through layer-wise analysis, we attribute this failure to a parametric suppression phenomenon: specifically, in deep layers, certain FFN layers overwrite context-sensitive representations with memorized priors. To address this, we propose CoRect (Context-Aware Logit Contrast for Hidden State Rectification). By contrasting logits from contextualized and non-contextualized forward passes, CoRect identifies layers that exhibit high parametric bias without requiring ground-truth labels. It then rectifies the hidden states to preserve evidence-grounded information. Across question answering (QA) and summarization benchmarks, CoRect consistently improves faithfulness and reduces hallucinations compared to strong baselines.
