Neighboring Perturbations of Knowledge Editing on Large Language Models

Jun-Yu Ma; Zhen-Hua Ling; Ningyu Zhang; Jia-Chen Gu

Neighboring Perturbations of Knowledge Editing on Large Language Models

Jun-Yu Ma, Zhen-Hua Ling, Ningyu Zhang, Jia-Chen Gu

TL;DR

This work tackles how knowledge editing in LLMs can perturb neighboring information when new facts are appended. It introduces the additivity metric and the PEAK benchmark to quantify neighboring perturbations and presents APP, a plug‑and‑play framework that preserves original correct knowledge and suppresses false knowledge during appending without retraining. Across multiple LLMs and editing methods, they find that existing editors induce noticeable neighbor perturbations, while APP consistently reduces AFF and ANF and improves locality. This work provides a practical evaluation protocol and a mitigation strategy that can enhance the reliability of knowledge editing in real-world deployments, with implications for domain‑specific and safety‑critical applications.

Abstract

Despite their exceptional capabilities, large language models (LLMs) are prone to generating unintended text due to false or outdated knowledge. Given the resource-intensive nature of retraining LLMs, there has been a notable increase in the development of knowledge editing. However, current approaches and evaluations rarely explore the perturbation of editing on neighboring knowledge. This paper studies whether updating new knowledge to LLMs perturbs the neighboring knowledge encapsulated within them. Specifically, we seek to figure out whether appending a new answer into an answer list to a factual question leads to catastrophic forgetting of original correct answers in this list, as well as unintentional inclusion of incorrect answers. A metric of additivity is introduced and a benchmark dubbed as Perturbation Evaluation of Appending Knowledge (PEAK) is constructed to evaluate the degree of perturbation to neighboring knowledge when appending new knowledge. Besides, a plug-and-play framework termed Appending via Preservation and Prevention (APP) is proposed to mitigate the neighboring perturbation by maintaining the integrity of the answer list. Experiments demonstrate the effectiveness of APP coupling with four editing methods on four LLMs. The code and data are available at https://github.com/mjy1111/PEAK.

Neighboring Perturbations of Knowledge Editing on Large Language Models

TL;DR

Abstract

Paper Structure (45 sections, 17 equations, 5 figures, 9 tables)

This paper contains 45 sections, 17 equations, 5 figures, 9 tables.

Introduction
Related Work
Preliminary
Querying Factual Knowledge in LLMs
Knowledge Editing
Definition of Additivity
Relative ranking of objects
Absolute probability change of objects
Aggregation
PEAK: Perturbation Evaluation of Appending Knowledge
Data Construction of PEAK-CF
Aggregating facts
Constructing counterfactual edits
Sampling false answers
Filtering correct and false answers
...and 30 more sections

Figures (5)

Figure 1: Illustration of the neighboring perturbations while appending a new answer into an answer list to a factual question. Catastrophic forgetting of original correct answers and unintentional inclusion of incorrect answer are both undesirable. $f_{\theta}$ / $f_{\theta_{e}}$ denotes the models before / after editing.
Figure 2: Average probability of the correct answers $O$ and the false answers $O_h$ (Hard) and $O_r$ (Random) of LLaMA-2 after editing with different editing methods. LLaMA-2 refers to the unedited model. "+" means this method was coupled with APP.
Figure 3: Ablation analysis of probability and additivity for APP. Results were conducted with LLaMA-2 on PEAK-CF dataset. Due to page limit, results for other methods are put in Appendix \ref{['append-ablation']}.
Figure 4: Both AFF and ANF in Hard setting on PEAK-CF with LLaMA-2 across four editing methods equipped with APP where $k$ original correct and false answers were used. $k \in [0, 1, 3, 5, all]$.
Figure 5: Ablation analysis of probability and additivity for APP. Results were conducted with LLaMA-2 on PEAK-CF dataset.

Neighboring Perturbations of Knowledge Editing on Large Language Models

TL;DR

Abstract

Neighboring Perturbations of Knowledge Editing on Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (5)