Rethinking the Residual Distribution of Locate-then-Editing Methods in Model Editing
Xiaopeng Li, Shanwen Wang, Shasha Li, Shezheng Song, Bin Ji, Jun Ma, Jie Yu
TL;DR
This work scrutinizes residual distribution in locate-then-edit model-editing approaches and reveals that distributing residuals can introduce weight-shift errors that worsen edits, especially with larger batch sizes, longer edit sequences, or greater distribution distances. It proves a theoretical upper bound on weight-update error under distribution and demonstrates empirically that directly computed boundary-layer residuals yield superior editing performance. The authors propose BLUE, updating only the first and last critical layers, and show across three LLMs and two datasets that BLUE enhances editing efficacy by about 35.59% on average, improves general capability retention, and mitigates hidden-state shifts, while also boosting efficiency in sequential and long-form editing tasks. These results offer a practical, scalable improvement to locate-then-edit methods and broaden their applicability in real-world knowledge updates.
Abstract
Model editing enables targeted updates to the knowledge of large language models (LLMs) with minimal retraining. Among existing approaches, locate-then-edit methods constitute a prominent paradigm: they first identify critical layers, then compute residuals at the final critical layer based on the target edit, and finally apply least-squares-based multi-layer updates via $\textbf{residual distribution}$. While empirically effective, we identify a counterintuitive failure mode: residual distribution, a core mechanism in these methods, introduces weight shift errors that undermine editing precision. Through theoretical and empirical analysis, we show that such errors increase with the distribution distance, batch size, and edit sequence length, ultimately leading to inaccurate or suboptimal edits. To address this, we propose the $\textbf{B}$oundary $\textbf{L}$ayer $\textbf{U}$pdat$\textbf{E (BLUE)}$ strategy to enhance locate-then-edit methods. Sequential batch editing experiments on three LLMs and two datasets demonstrate that BLUE not only delivers an average performance improvement of 35.59\%, significantly advancing the state of the art in model editing, but also enhances the preservation of LLMs' general capabilities. Our code is available at https://github.com/xpq-tech/BLUE.
