Understanding the Collapse of LLMs in Model Editing

Wanli Yang; Fei Sun; Jiajun Tan; Xinyu Ma; Du Su; Dawei Yin; Huawei Shen

Understanding the Collapse of LLMs in Model Editing

Wanli Yang, Fei Sun, Jiajun Tan, Xinyu Ma, Du Su, Dawei Yin, Huawei Shen

TL;DR

This work investigates why a single edit in Rank-One Model Editing (ROME) can cause large language models to collapse. It identifies two root causes: inconsistent use of prefixed versus unprefixed keys in the update denominator, and anomalous first-token representations that differ from subsequent tokens in autoregressive transformers. The authors show that aligning keys to be consistently prefixed during editing (and effectively simulating contexts with prefixes) prevents collapse but initially harms edit efficacy due to training-testing mismatch. They propose a simple remedy—prefix collapse-case prompts during testing to maintain consistency—demonstrating improved efficacy across GPT-2-XL, GPT-J, and Llama2-7b while preserving stability. The findings offer a practical path to safer, more reliable model editing in real-world LLM deployments, with clear directions for broader validation and deeper analysis of first-token dynamics.

Abstract

Despite significant progress in model editing methods, their application in real-world scenarios remains challenging as they often cause large language models (LLMs) to collapse. Among them, ROME is particularly concerning, as it could disrupt LLMs with only a single edit. In this paper, we study the root causes of such collapse. Through extensive analysis, we identify two primary factors that contribute to the collapse: i) inconsistent handling of prefixed and unprefixed keys in the parameter update equation may result in very small denominators, causing excessively large parameter updates; ii) the subject of collapse cases is usually the first token, whose unprefixed key distribution significantly differs from the prefixed key distribution in autoregressive transformers, causing the aforementioned issue to materialize. To validate our findings, we propose a simple yet effective approach: uniformly using prefixed keys during editing phase and adding prefixes during testing phase to ensure the consistency between training and testing. The experimental results show that the proposed solution can prevent model collapse while maintaining the effectiveness of the edits.

Understanding the Collapse of LLMs in Model Editing

TL;DR

Abstract

Paper Structure (14 sections, 4 equations, 8 figures, 7 tables)

This paper contains 14 sections, 4 equations, 8 figures, 7 tables.

Introduction
Background
Why Does ROME Cause Collapse?
Inconsistent Keys in Editing
Anomalous Key Distribution for Collapse
Special Role of the First Token
A Simple Solution to Avoid Collapse
Conclusion and Future Work
Appendix
Distribution of Keys in Other LLMs
Results without Prepended Token
Representation of First Token in T5-3B
Impact of Position Embedding
Collapse of First Token Representation

Figures (8)

Figure 1: To update "the president of the United States" from "Donald Trump" to "Joe Biden", ROME locates the knowledge into the MLP module within a specific transformer block using the Causal Tracing mechanism. It then adjusts the second layer of MLP (i.e., weight matrix $W$) to change the value $\bm{v}$ for the key $\bm{k}$ that represents the subject "the United States" to a new value $\bm{v}_*$, thereby inducing the LLMs to predict the target object "Joe Biden".
Figure 2: t-SNE visualization of (a) elements in the denominator; (b) different implementation of key vectors.
Figure 3: t-SNE visualization of representation distributions of (a) the first token in randomly sampled normal prompts; (b) $\bm{k}^{u}$ in prefixed collapse prompts.
Figure 4: t-SNE visualization of (a) elements in the denominator; (b) different implementation of key vectors for GPT-J.
Figure 5: t-SNE visualization of (a) elements in the denominator; (b) different implementation of key vectors for Llama2-7b.
...and 3 more figures

Understanding the Collapse of LLMs in Model Editing

TL;DR

Abstract

Understanding the Collapse of LLMs in Model Editing

Authors

TL;DR

Abstract

Table of Contents

Figures (8)