Table of Contents
Fetching ...

Resolving UnderEdit & OverEdit with Iterative & Neighbor-Assisted Model Editing

Bhiman Kumar Baghel, Emma Jordan, Zheyuan Ryan Shi, Xiang Lorraine Li

TL;DR

This work tackles the challenge of keeping LLM knowledge up-to-date by addressing two failure modes of model editing: UnderEdit (failure to inject updates) and OverEdit (unintended changes to neighboring knowledge). It introduces two complementary strategies—Iterative Model Editing and Neighbor-Assisted Model Editing—that play within the standard two-stage locate-and-edit framework (Optimization to learn ideal hidden states in causal layers, followed by Spread to update final MLP weights). Empirical results across multiple editors (MEMIT, PMET, AlphaEdit, ROME), model families (GPT-2 XL, GPT-J, Llama-2, Llama-3.1), and factual benchmarks (COUNTERFACT, ZsRE) show substantial reductions in UnderEdit (up to 38 percentage points) and OverEdit (up to 6 points), with iterative editing enhancing edit success and neighbor-assisted editing boosting specificity. The methods are broadly applicable to existing and future locate-and-edit approaches and come with code release for reproducibility and integration into practical knowledge-update pipelines.

Abstract

Large Language Models (LLMs) are widely deployed in downstream tasks, but keeping their knowledge up-to-date via retraining or fine-tuning is often computationally expensive. Model editing provides a more efficient alternative by updating a targeted subset of parameters, which often follows the locate-and-edit paradigm. Despite this efficiency, existing methods are limited: edits may fail to inject knowledge (UnderEdit) or unintentionally disrupt unrelated neighboring knowledge (OverEdit). To address these challenges, we propose two complementary methods: iterative model editing, which applies successive edits to mitigate UnderEdit, and neighbor-assisted model editing, which incorporates neighboring knowledge during editing to reduce OverEdit. Our extensive experiments show that these techniques improve editing performance across multiple LLMs, algorithms, and benchmarks, reducing UnderEdit by up to 38 percentage points and OverEdit by up to 6, while remaining broadly applicable to any locate-and-edit method. We release our code at https://github.com/bhimanbaghel/ResolveUnderOverEdit.

Resolving UnderEdit & OverEdit with Iterative & Neighbor-Assisted Model Editing

TL;DR

This work tackles the challenge of keeping LLM knowledge up-to-date by addressing two failure modes of model editing: UnderEdit (failure to inject updates) and OverEdit (unintended changes to neighboring knowledge). It introduces two complementary strategies—Iterative Model Editing and Neighbor-Assisted Model Editing—that play within the standard two-stage locate-and-edit framework (Optimization to learn ideal hidden states in causal layers, followed by Spread to update final MLP weights). Empirical results across multiple editors (MEMIT, PMET, AlphaEdit, ROME), model families (GPT-2 XL, GPT-J, Llama-2, Llama-3.1), and factual benchmarks (COUNTERFACT, ZsRE) show substantial reductions in UnderEdit (up to 38 percentage points) and OverEdit (up to 6 points), with iterative editing enhancing edit success and neighbor-assisted editing boosting specificity. The methods are broadly applicable to existing and future locate-and-edit approaches and come with code release for reproducibility and integration into practical knowledge-update pipelines.

Abstract

Large Language Models (LLMs) are widely deployed in downstream tasks, but keeping their knowledge up-to-date via retraining or fine-tuning is often computationally expensive. Model editing provides a more efficient alternative by updating a targeted subset of parameters, which often follows the locate-and-edit paradigm. Despite this efficiency, existing methods are limited: edits may fail to inject knowledge (UnderEdit) or unintentionally disrupt unrelated neighboring knowledge (OverEdit). To address these challenges, we propose two complementary methods: iterative model editing, which applies successive edits to mitigate UnderEdit, and neighbor-assisted model editing, which incorporates neighboring knowledge during editing to reduce OverEdit. Our extensive experiments show that these techniques improve editing performance across multiple LLMs, algorithms, and benchmarks, reducing UnderEdit by up to 38 percentage points and OverEdit by up to 6, while remaining broadly applicable to any locate-and-edit method. We release our code at https://github.com/bhimanbaghel/ResolveUnderOverEdit.

Paper Structure

This paper contains 38 sections, 18 equations, 5 figures, 13 tables.

Figures (5)

  • Figure 1: The example from C OUNTERF ACT updates iPad producer from Apple to Honda. UnderEdit fails to make the desired update in the Edit sentence, while OverEdit introduces the undesired change in the Test sentences as shown in (a). The proposed iterative model editing mitigated UnderEdit and neighbor-assisted model editing reduced OverEdit by incorporating related knowledge in edit stage as shown in (b).
  • Figure 2: The diagram shows a simplified transformer layer to complement Table \ref{['tab:algos']}, composed of attention and MLP modules. Only the last MLP is shown, as all methods modify its parameters.
  • Figure 3: An editing example of using MEMIT to edit GPT-J. Iterative model editing resolving UnderEdit. As the iteration proceeds the perplexity differences eventually reduces to $\leq \epsilon$, leading to the model predicting new object. The perplexity values are Box-Cox transformed to better visualize extreme high and low values.
  • Figure 4: C OUNTERF ACT data sample for neighbor-assited model editing.
  • Figure 5: Improvement in efficacy accuracy and reduction in $|\Delta p_k|$ for UnderEdit examples over iterative model editing. The results show that iterative editing mitigates UnderEdit cases in GPT-2 XL edited with MEMIT on C OUNTERF ACT, contributing to overall performance gains.