Table of Contents
Fetching ...

Rethinking the Residual Distribution of Locate-then-Editing Methods in Model Editing

Xiaopeng Li, Shanwen Wang, Shasha Li, Shezheng Song, Bin Ji, Jun Ma, Jie Yu

TL;DR

This work scrutinizes residual distribution in locate-then-edit model-editing approaches and reveals that distributing residuals can introduce weight-shift errors that worsen edits, especially with larger batch sizes, longer edit sequences, or greater distribution distances. It proves a theoretical upper bound on weight-update error under distribution and demonstrates empirically that directly computed boundary-layer residuals yield superior editing performance. The authors propose BLUE, updating only the first and last critical layers, and show across three LLMs and two datasets that BLUE enhances editing efficacy by about 35.59% on average, improves general capability retention, and mitigates hidden-state shifts, while also boosting efficiency in sequential and long-form editing tasks. These results offer a practical, scalable improvement to locate-then-edit methods and broaden their applicability in real-world knowledge updates.

Abstract

Model editing enables targeted updates to the knowledge of large language models (LLMs) with minimal retraining. Among existing approaches, locate-then-edit methods constitute a prominent paradigm: they first identify critical layers, then compute residuals at the final critical layer based on the target edit, and finally apply least-squares-based multi-layer updates via $\textbf{residual distribution}$. While empirically effective, we identify a counterintuitive failure mode: residual distribution, a core mechanism in these methods, introduces weight shift errors that undermine editing precision. Through theoretical and empirical analysis, we show that such errors increase with the distribution distance, batch size, and edit sequence length, ultimately leading to inaccurate or suboptimal edits. To address this, we propose the $\textbf{B}$oundary $\textbf{L}$ayer $\textbf{U}$pdat$\textbf{E (BLUE)}$ strategy to enhance locate-then-edit methods. Sequential batch editing experiments on three LLMs and two datasets demonstrate that BLUE not only delivers an average performance improvement of 35.59\%, significantly advancing the state of the art in model editing, but also enhances the preservation of LLMs' general capabilities. Our code is available at https://github.com/xpq-tech/BLUE.

Rethinking the Residual Distribution of Locate-then-Editing Methods in Model Editing

TL;DR

This work scrutinizes residual distribution in locate-then-edit model-editing approaches and reveals that distributing residuals can introduce weight-shift errors that worsen edits, especially with larger batch sizes, longer edit sequences, or greater distribution distances. It proves a theoretical upper bound on weight-update error under distribution and demonstrates empirically that directly computed boundary-layer residuals yield superior editing performance. The authors propose BLUE, updating only the first and last critical layers, and show across three LLMs and two datasets that BLUE enhances editing efficacy by about 35.59% on average, improves general capability retention, and mitigates hidden-state shifts, while also boosting efficiency in sequential and long-form editing tasks. These results offer a practical, scalable improvement to locate-then-edit methods and broaden their applicability in real-world knowledge updates.

Abstract

Model editing enables targeted updates to the knowledge of large language models (LLMs) with minimal retraining. Among existing approaches, locate-then-edit methods constitute a prominent paradigm: they first identify critical layers, then compute residuals at the final critical layer based on the target edit, and finally apply least-squares-based multi-layer updates via . While empirically effective, we identify a counterintuitive failure mode: residual distribution, a core mechanism in these methods, introduces weight shift errors that undermine editing precision. Through theoretical and empirical analysis, we show that such errors increase with the distribution distance, batch size, and edit sequence length, ultimately leading to inaccurate or suboptimal edits. To address this, we propose the oundary ayer pdat strategy to enhance locate-then-edit methods. Sequential batch editing experiments on three LLMs and two datasets demonstrate that BLUE not only delivers an average performance improvement of 35.59\%, significantly advancing the state of the art in model editing, but also enhances the preservation of LLMs' general capabilities. Our code is available at https://github.com/xpq-tech/BLUE.

Paper Structure

This paper contains 34 sections, 2 theorems, 14 equations, 14 figures, 9 tables.

Key Result

Theorem 4.1

In the locate-then-edit model editing, when using residual distribution, the upper bound for the weight shift error between the exact weight shift $\Delta^{l^*}$ and the actual weight shift $\Delta^{l}$ is given by where ${\bm{R}}^{l^*}$ denotes the exact residual, and ${\bm{Q}} = {{\bm{K}}_1^l}^T\left({\bm{K}}_0^l {{\bm{K}}_0^l}^T+{\bm{K}}_1^l {{\bm{K}}_1^l}^T\right)^{-1}$.

Figures (14)

  • Figure 1: Comparison of existing locate-then-edit methods and BLUE.
  • Figure 2: The average contribution score of different simulated editing layers.
  • Figure 3: The variation in cosine similarity between the distributed and the directly computed memory across different layers.
  • Figure 4: Performance variations when editing different single layers of the model using computed and distributed residuals separately. Fluency and Consistency are normalized.
  • Figure 5: Variation of $\| {\bm{R}}^{l^*} - {\bm{R}}^{L} \|_2$ across layers.
  • ...and 9 more figures

Theorems & Definitions (4)

  • Theorem 4.1
  • Remark 4.2
  • Lemma 4.3
  • proof