Understanding the Collapse of LLMs in Model Editing
Wanli Yang, Fei Sun, Jiajun Tan, Xinyu Ma, Du Su, Dawei Yin, Huawei Shen
TL;DR
This work investigates why a single edit in Rank-One Model Editing (ROME) can cause large language models to collapse. It identifies two root causes: inconsistent use of prefixed versus unprefixed keys in the update denominator, and anomalous first-token representations that differ from subsequent tokens in autoregressive transformers. The authors show that aligning keys to be consistently prefixed during editing (and effectively simulating contexts with prefixes) prevents collapse but initially harms edit efficacy due to training-testing mismatch. They propose a simple remedy—prefix collapse-case prompts during testing to maintain consistency—demonstrating improved efficacy across GPT-2-XL, GPT-J, and Llama2-7b while preserving stability. The findings offer a practical path to safer, more reliable model editing in real-world LLM deployments, with clear directions for broader validation and deeper analysis of first-token dynamics.
Abstract
Despite significant progress in model editing methods, their application in real-world scenarios remains challenging as they often cause large language models (LLMs) to collapse. Among them, ROME is particularly concerning, as it could disrupt LLMs with only a single edit. In this paper, we study the root causes of such collapse. Through extensive analysis, we identify two primary factors that contribute to the collapse: i) inconsistent handling of prefixed and unprefixed keys in the parameter update equation may result in very small denominators, causing excessively large parameter updates; ii) the subject of collapse cases is usually the first token, whose unprefixed key distribution significantly differs from the prefixed key distribution in autoregressive transformers, causing the aforementioned issue to materialize. To validate our findings, we propose a simple yet effective approach: uniformly using prefixed keys during editing phase and adding prefixes during testing phase to ensure the consistency between training and testing. The experimental results show that the proposed solution can prevent model collapse while maintaining the effectiveness of the edits.
