Knowledge Localization: Mission Not Accomplished? Enter Query Localization!
Yuheng Chen, Pengfei Cao, Yubo Chen, Kang Liu, Jun Zhao
TL;DR
This work challenges the Knowledge Localization (KL) assumption by showing that many factual pieces of knowledge in large language models do not localize to a fixed set of knowledge neurons (KNs). It introduces Query Localization (QL), comprising query-KN mapping and dynamic KN selection, to better capture how knowledge is stored and expressed, including the role of attention. Through statistical analysis and modification-based experiments across multiple models and the ParaRel dataset, the authors demonstrate widespread inconsistent knowledge (KI) and show that KL is a simplification of QL. They further propose Consistency-Aware KN Modification (CAS) which leverages QL to improve knowledge editing, achieving better generalization and lower disruption, thereby validating QL’s utility for both understanding and modifying knowledge in LLMs. The work provides a path toward more robust and interpretable knowledge manipulation in AI systems and outlines future directions for integrating attention into knowledge editing paradigms.
Abstract
Large language models (LLMs) store extensive factual knowledge, but the mechanisms behind how they store and express this knowledge remain unclear. The Knowledge Neuron (KN) thesis is a prominent theory for explaining these mechanisms. This theory is based on the Knowledge Localization (KL) assumption, which suggests that a fact can be localized to a few knowledge storage units, namely knowledge neurons. However, this assumption has two limitations: first, it may be too rigid regarding knowledge storage, and second, it neglects the role of the attention module in knowledge expression. In this paper, we first re-examine the KL assumption and demonstrate that its limitations do indeed exist. To address these, we then present two new findings, each targeting one of the limitations: one focusing on knowledge storage and the other on knowledge expression. We summarize these findings as \textbf{Query Localization} (QL) assumption and argue that the KL assumption can be viewed as a simplification of the QL assumption. Based on QL assumption, we further propose the Consistency-Aware KN modification method, which improves the performance of knowledge modification, further validating our new assumption. We conduct 39 sets of experiments, along with additional visualization experiments, to rigorously confirm our conclusions. Code is available at https://github.com/heng840/KnowledgeLocalization.
