Knowledge in Superposition: Unveiling the Failures of Lifelong Knowledge Editing for Large Language Models

Chenhui Hu; Pengfei Cao; Yubo Chen; Kang Liu; Jun Zhao

Knowledge in Superposition: Unveiling the Failures of Lifelong Knowledge Editing for Large Language Models

Chenhui Hu, Pengfei Cao, Yubo Chen, Kang Liu, Jun Zhao

TL;DR

This work tackles lifelong editing of knowledge in large language models by deriving a Lifelong extension of the standard closed-form update $W \leftarrow W + \Lambda(C^{-1} k_e)^T$ for linear associative memory and revealing an interference term that accumulates with edits. The authors show that this interference is governed by knowledge superposition, which corresponds to non-orthogonal representations in whitening space; when representations are perfectly orthogonal in whitening space, lifelong editing becomes lossless. Empirically, they demonstrate that knowledge superposition is universal across model families (GPT2, Pythia, Llama) and layers, characterized by high kurtosis, zero-mean heavy-tailed distributions, and a scaling law where larger models exhibit weaker superposition. These findings connect a theoretical framework with extensive experiments, explaining why current lifelong editing methods struggle and pointing to potential directions such as editing in whitening space or decomposing knowledge to mitigate interference.

Abstract

Knowledge editing aims to update outdated or incorrect knowledge in large language models (LLMs). However, current knowledge editing methods have limited scalability for lifelong editing. This study explores the fundamental reason why knowledge editing fails in lifelong editing. We begin with the closed-form solution derived from linear associative memory, which underpins state-of-the-art knowledge editing methods. We extend the solution from single editing to lifelong editing, and through rigorous mathematical derivation, identify an interference term in the final solution, suggesting that editing knowledge may impact irrelevant knowledge. Further analysis of the interference term reveals a close relationship with superposition between knowledge representations. When knowledge superposition does not exist in language models, the interference term vanishes, allowing for lossless knowledge editing. Experiments across numerous language models reveal that knowledge superposition is universal, exhibiting high kurtosis, zero mean, and heavy-tailed distributions with clear scaling laws. Ultimately, by combining theory and experiments, we demonstrate that knowledge superposition is the fundamental reason for the failure of lifelong editing. Moreover, this is the first study to investigate knowledge editing from the perspective of superposition and provides a comprehensive observation of superposition across numerous real-world language models. Code available at https://github.com/ChenhuiHu/knowledge_in_superposition.

Knowledge in Superposition: Unveiling the Failures of Lifelong Knowledge Editing for Large Language Models

TL;DR

This work tackles lifelong editing of knowledge in large language models by deriving a Lifelong extension of the standard closed-form update

for linear associative memory and revealing an interference term that accumulates with edits. The authors show that this interference is governed by knowledge superposition, which corresponds to non-orthogonal representations in whitening space; when representations are perfectly orthogonal in whitening space, lifelong editing becomes lossless. Empirically, they demonstrate that knowledge superposition is universal across model families (GPT2, Pythia, Llama) and layers, characterized by high kurtosis, zero-mean heavy-tailed distributions, and a scaling law where larger models exhibit weaker superposition. These findings connect a theoretical framework with extensive experiments, explaining why current lifelong editing methods struggle and pointing to potential directions such as editing in whitening space or decomposing knowledge to mitigate interference.

Abstract

Paper Structure (24 sections, 30 equations, 32 figures, 4 tables)

This paper contains 24 sections, 30 equations, 32 figures, 4 tables.

Introduction
Related Work
Knowledge Editing
Superposition
Preliminary
Expanding to Lifelong Editing
Interference Term of Original Knowledge $\Delta_o$
Interference Term of Edited Knowledge $\Delta_e$
How to Understand $p(\cdot,\cdot)$
Knowledge in Superposition
Universal in Language Models
Heavy-Tailed Distribution in Language Models
Scaling Law for Superposition
Superposition in Whitening Space
Conclusion
...and 9 more sections

Figures (32)

Figure 1: Illustration of our work. (a) Current knowledge editing methods use a unified closed-form solution, which means adding $\Lambda(C^{-1}k_e)^T$ to parameters matrix $W$ to achieve knowledge updating. (b) Extend the closed-form solution to lifelong editing, where $W_n$ represents parameters matrix after the $n$-th edit. (c) Interference term accumulating sufficiently will cause language models to forget knowledge. (d) Superposition term, where $p(\cdot,\cdot)$ denotes the degree of superposition between two knowledge representations. (e) Superposition term actually determines interference term.
Figure 2: A neural network with only three neurons, corresponding to three dimensions, (a) can directly represent three features orthogonally, but (b) to represent six features (or more features), it will using superposition to noisily encode them nearly orthogonally.
Figure 3: Superposition at layer 0 across different language models visualized using P matrices, ordered by model size. Each point in these 128x128 $P$ matrices corresponds to the $p(\cdot,\cdot)$ value between two pieces of knowledge.
Figure 4: KDE of $p(\cdot,\cdot)$ values in P matrics at layer 0 across different language models, ordered by model size. In $P$ matrices, $p(\cdot,\cdot)$ values are concentrated around 0, with high kurtosis, which increases as model size grows.
Figure 5: The scaling law of knowledge superposition. Higher kurtosis means less superposition.
...and 27 more figures

Theorems & Definitions (1)

Definition 1: Matrix Whitening

Knowledge in Superposition: Unveiling the Failures of Lifelong Knowledge Editing for Large Language Models

TL;DR

Abstract

Knowledge in Superposition: Unveiling the Failures of Lifelong Knowledge Editing for Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (32)

Theorems & Definitions (1)