Table of Contents
Fetching ...

See the Unseen: Better Context-Consistent Knowledge-Editing by Noises

Youcheng Huang, Wenqiang Lei, Zheng Zhang, Jiancheng Lv, Shuicheng Yan

TL;DR

This work tackles the problem of editing LLM knowledge while preserving context-consistency, revealing that context-induced FFN activation shifts follow a Gaussian-like pattern. It proposes Deep Noise Editing (DNE), injecting Gaussian-like noise into FFN activations across multiple layers to simulate unseen contexts during editing, building on ROME/MEMIT frameworks. Empirical results on GPT2-xl, GPT-J, and LLaMA-2 across zsRE and Counterfacts show that DNE improves generalization to paraphrased prompts and related contexts, often outperforming NoisyTune and related baselines. The approach offers a practical path to more robust, context-aware knowledge edits with broad implications for interpretability and safe deployment of edited LLMs.

Abstract

Knowledge-editing updates knowledge of large language models (LLMs) and contributes to the interpretability and application of LLMs. However, knowledge applying is context-consistent: LLMs can recall the same knowledge in different contexts. Existing works ignore this property and the editing lacks generalization. In this paper, we empirically find that the effects of different contexts upon LLMs in recalling the same knowledge follow a Gaussian-like distribution. We then sample Gaussian noises to simulate the effects of different contexts when updating LLMs. By such, we can make LLMs see the unseen contexts where the edited knowledge will be applied, therefore improving the editing generalization. Experimental results on three LLMs demonstrate the effectiveness of our methods and also distinguish our methods from the others of fine-tuning LLMs by noises.

See the Unseen: Better Context-Consistent Knowledge-Editing by Noises

TL;DR

This work tackles the problem of editing LLM knowledge while preserving context-consistency, revealing that context-induced FFN activation shifts follow a Gaussian-like pattern. It proposes Deep Noise Editing (DNE), injecting Gaussian-like noise into FFN activations across multiple layers to simulate unseen contexts during editing, building on ROME/MEMIT frameworks. Empirical results on GPT2-xl, GPT-J, and LLaMA-2 across zsRE and Counterfacts show that DNE improves generalization to paraphrased prompts and related contexts, often outperforming NoisyTune and related baselines. The approach offers a practical path to more robust, context-aware knowledge edits with broad implications for interpretability and safe deployment of edited LLMs.

Abstract

Knowledge-editing updates knowledge of large language models (LLMs) and contributes to the interpretability and application of LLMs. However, knowledge applying is context-consistent: LLMs can recall the same knowledge in different contexts. Existing works ignore this property and the editing lacks generalization. In this paper, we empirically find that the effects of different contexts upon LLMs in recalling the same knowledge follow a Gaussian-like distribution. We then sample Gaussian noises to simulate the effects of different contexts when updating LLMs. By such, we can make LLMs see the unseen contexts where the edited knowledge will be applied, therefore improving the editing generalization. Experimental results on three LLMs demonstrate the effectiveness of our methods and also distinguish our methods from the others of fine-tuning LLMs by noises.
Paper Structure (21 sections, 5 equations, 12 figures, 11 tables)

This paper contains 21 sections, 5 equations, 12 figures, 11 tables.

Figures (12)

  • Figure 1: Different contexts place shifts that follow a Gaussian -like distribution to FFNs' activations on knowledge-related tokens. We achieve better context-consistent knowledge-editing by sampling noises to simulate the effects.
  • Figure 2: GPT2-xl ${\mathbb{H}}_s,\!{\mathbb{H}}_c$.
  • Figure 3: GPT2-xl ${\mathbb{D}}_s,\!{\mathbb{D}}_c$.
  • Figure 4: GPT-J ${\mathbb{H}}_s,\!{\mathbb{H}}_c$.
  • Figure 5: GPT2-J ${\mathbb{D}}_s,\!{\mathbb{D}}_c$.
  • ...and 7 more figures