Table of Contents
Fetching ...

Analysing the Residual Stream of Language Models Under Knowledge Conflicts

Yu Zhao, Xiaotang Du, Giwon Hong, Aryo Pradipta Gema, Alessio Devoto, Hongru Wang, Xuanli He, Kam-Fai Wong, Pasquale Minervini

TL;DR

The paper analyzes how large language models handle conflicts between their parametric knowledge and contextual information. By examining the residual stream in Transformer architectures and applying linear probing, it demonstrates that a mid-layer signal of knowledge conflict can be detected without changing inputs or model parameters, achieving high accuracy in identifying conflicts. It also shows distinct skewness patterns in residual activations depending on whether contextual or parametric knowledge is used, and demonstrates that the model’s knowledge-source choice is predictable after conflict detection. These findings offer mechanistic insights into knowledge management in LLMs and lay the groundwork for inference-time interventions to steer knowledge selection before producing answers.

Abstract

Large language models (LLMs) can store a significant amount of factual knowledge in their parameters. However, their parametric knowledge may conflict with the information provided in the context. Such conflicts can lead to undesirable model behaviour, such as reliance on outdated or incorrect information. In this work, we investigate whether LLMs can identify knowledge conflicts and whether it is possible to know which source of knowledge the model will rely on by analysing the residual stream of the LLM. Through probing tasks, we find that LLMs can internally register the signal of knowledge conflict in the residual stream, which can be accurately detected by probing the intermediate model activations. This allows us to detect conflicts within the residual stream before generating the answers without modifying the input or model parameters. Moreover, we find that the residual stream shows significantly different patterns when the model relies on contextual knowledge versus parametric knowledge to resolve conflicts. This pattern can be employed to estimate the behaviour of LLMs when conflict happens and prevent unexpected answers before producing the answers. Our analysis offers insights into how LLMs internally manage knowledge conflicts and provides a foundation for developing methods to control the knowledge selection processes.

Analysing the Residual Stream of Language Models Under Knowledge Conflicts

TL;DR

The paper analyzes how large language models handle conflicts between their parametric knowledge and contextual information. By examining the residual stream in Transformer architectures and applying linear probing, it demonstrates that a mid-layer signal of knowledge conflict can be detected without changing inputs or model parameters, achieving high accuracy in identifying conflicts. It also shows distinct skewness patterns in residual activations depending on whether contextual or parametric knowledge is used, and demonstrates that the model’s knowledge-source choice is predictable after conflict detection. These findings offer mechanistic insights into knowledge management in LLMs and lay the groundwork for inference-time interventions to steer knowledge selection before producing answers.

Abstract

Large language models (LLMs) can store a significant amount of factual knowledge in their parameters. However, their parametric knowledge may conflict with the information provided in the context. Such conflicts can lead to undesirable model behaviour, such as reliance on outdated or incorrect information. In this work, we investigate whether LLMs can identify knowledge conflicts and whether it is possible to know which source of knowledge the model will rely on by analysing the residual stream of the LLM. Through probing tasks, we find that LLMs can internally register the signal of knowledge conflict in the residual stream, which can be accurately detected by probing the intermediate model activations. This allows us to detect conflicts within the residual stream before generating the answers without modifying the input or model parameters. Moreover, we find that the residual stream shows significantly different patterns when the model relies on contextual knowledge versus parametric knowledge to resolve conflicts. This pattern can be employed to estimate the behaviour of LLMs when conflict happens and prevent unexpected answers before producing the answers. Our analysis offers insights into how LLMs internally manage knowledge conflicts and provides a foundation for developing methods to control the knowledge selection processes.

Paper Structure

This paper contains 17 sections, 19 figures.

Figures (19)

  • Figure 1: Accuracy, AUROC, and AUPRC of probing models on detecting the knowledge conflicts based on the activations of Llama3-8B. The probing results on hidden state, MLP and Self-Attention activation are coloured red, blue and green, respectively. More analysis is presented in \ref{['sec:more-conflict-probing']}.
  • Figure 2: Skewness of the hidden state activations of Llama3-8B when in presence of knowledge conflicts. Blue and red lines represent the skewness of hidden states from $D_{a_C}^{e_C}$ and $D_{a_M}^{e_C}$, respectively. Higher scores indicate a more skewed distribution. Additional analyses are available in \ref{['sec:more-skewness-plots']}.
  • Figure 3: Accuracy, AUROC, and AUPRC of probing models on predicting which source of knowledge the model will use to predict the answer in Llama3-8B. More results are Skewness of the hidden state activations of Llama3-8B when the model uses knowledge from different sources to predict the answer. Additional results are available in \ref{['sec:more-behaviour']}.
  • Figure 4: Knowledge conflict probing results using Llama2-7B on NQSwap.
  • Figure 5: Knowledge conflict probing results using Llama2-7B on Macnoise.
  • ...and 14 more figures