Table of Contents
Fetching ...

Knowledge Localization: Mission Not Accomplished? Enter Query Localization!

Yuheng Chen, Pengfei Cao, Yubo Chen, Kang Liu, Jun Zhao

TL;DR

This work challenges the Knowledge Localization (KL) assumption by showing that many factual pieces of knowledge in large language models do not localize to a fixed set of knowledge neurons (KNs). It introduces Query Localization (QL), comprising query-KN mapping and dynamic KN selection, to better capture how knowledge is stored and expressed, including the role of attention. Through statistical analysis and modification-based experiments across multiple models and the ParaRel dataset, the authors demonstrate widespread inconsistent knowledge (KI) and show that KL is a simplification of QL. They further propose Consistency-Aware KN Modification (CAS) which leverages QL to improve knowledge editing, achieving better generalization and lower disruption, thereby validating QL’s utility for both understanding and modifying knowledge in LLMs. The work provides a path toward more robust and interpretable knowledge manipulation in AI systems and outlines future directions for integrating attention into knowledge editing paradigms.

Abstract

Large language models (LLMs) store extensive factual knowledge, but the mechanisms behind how they store and express this knowledge remain unclear. The Knowledge Neuron (KN) thesis is a prominent theory for explaining these mechanisms. This theory is based on the Knowledge Localization (KL) assumption, which suggests that a fact can be localized to a few knowledge storage units, namely knowledge neurons. However, this assumption has two limitations: first, it may be too rigid regarding knowledge storage, and second, it neglects the role of the attention module in knowledge expression. In this paper, we first re-examine the KL assumption and demonstrate that its limitations do indeed exist. To address these, we then present two new findings, each targeting one of the limitations: one focusing on knowledge storage and the other on knowledge expression. We summarize these findings as \textbf{Query Localization} (QL) assumption and argue that the KL assumption can be viewed as a simplification of the QL assumption. Based on QL assumption, we further propose the Consistency-Aware KN modification method, which improves the performance of knowledge modification, further validating our new assumption. We conduct 39 sets of experiments, along with additional visualization experiments, to rigorously confirm our conclusions. Code is available at https://github.com/heng840/KnowledgeLocalization.

Knowledge Localization: Mission Not Accomplished? Enter Query Localization!

TL;DR

This work challenges the Knowledge Localization (KL) assumption by showing that many factual pieces of knowledge in large language models do not localize to a fixed set of knowledge neurons (KNs). It introduces Query Localization (QL), comprising query-KN mapping and dynamic KN selection, to better capture how knowledge is stored and expressed, including the role of attention. Through statistical analysis and modification-based experiments across multiple models and the ParaRel dataset, the authors demonstrate widespread inconsistent knowledge (KI) and show that KL is a simplification of QL. They further propose Consistency-Aware KN Modification (CAS) which leverages QL to improve knowledge editing, achieving better generalization and lower disruption, thereby validating QL’s utility for both understanding and modifying knowledge in LLMs. The work provides a path toward more robust and interpretable knowledge manipulation in AI systems and outlines future directions for integrating attention into knowledge editing paradigms.

Abstract

Large language models (LLMs) store extensive factual knowledge, but the mechanisms behind how they store and express this knowledge remain unclear. The Knowledge Neuron (KN) thesis is a prominent theory for explaining these mechanisms. This theory is based on the Knowledge Localization (KL) assumption, which suggests that a fact can be localized to a few knowledge storage units, namely knowledge neurons. However, this assumption has two limitations: first, it may be too rigid regarding knowledge storage, and second, it neglects the role of the attention module in knowledge expression. In this paper, we first re-examine the KL assumption and demonstrate that its limitations do indeed exist. To address these, we then present two new findings, each targeting one of the limitations: one focusing on knowledge storage and the other on knowledge expression. We summarize these findings as \textbf{Query Localization} (QL) assumption and argue that the KL assumption can be viewed as a simplification of the QL assumption. Based on QL assumption, we further propose the Consistency-Aware KN modification method, which improves the performance of knowledge modification, further validating our new assumption. We conduct 39 sets of experiments, along with additional visualization experiments, to rigorously confirm our conclusions. Code is available at https://github.com/heng840/KnowledgeLocalization.
Paper Structure (50 sections, 26 equations, 11 figures, 8 tables)

This paper contains 50 sections, 26 equations, 11 figures, 8 tables.

Figures (11)

  • Figure 1: Heatmaps of the neuron activation values, with darker colors indicating higher values (can be viewed as knowledge neurons). The left two heatmaps show neuron activations for two neighbor queries of $\langle \textit{Suleiman I, position, Shah}\rangle$ ($\text{Fact}_1$), while the right two correspond to $\langle \textit{Christoph Ahlhaus, position, mayor}\rangle$ ($\text{Fact}_2$).
  • Figure 2: The Query Localization assumption.
  • Figure 3: Violin plot for Consistency Analysis. The $x$-axis are the fact relations, and the $y$-axis is the $CS_2$ value. The width of each violin plot indicates the density of data at different $CS_2$ values. We select a threshold of 0.3 as an example, and facts below this threshold are classified as $K_I$.
  • Figure 4: Results of Query-KN Mapping. "Enh" and "Sup" refer to enhancement and suppression of KN activation values, respectively, with "Avg" representing their average.
  • Figure 5: Heatmaps showing the neuron activation values, after suppressing knowledge synapses. The queries used here are the same as those in Figure \ref{['fig:introduction:heatmap']}. The dark areas in Figure \ref{['fig:introduction:heatmap']} appear lighter here, indicating a decrease in the activation value of knowledge neurons. For the enhanced case, see Figure \ref{['fig:appendix_heatmaps']} in Appendix \ref{['section:appendix Supplementary Experimental Results']}.
  • ...and 6 more figures