See the Unseen: Better Context-Consistent Knowledge-Editing by Noises
Youcheng Huang, Wenqiang Lei, Zheng Zhang, Jiancheng Lv, Shuicheng Yan
TL;DR
This work tackles the problem of editing LLM knowledge while preserving context-consistency, revealing that context-induced FFN activation shifts follow a Gaussian-like pattern. It proposes Deep Noise Editing (DNE), injecting Gaussian-like noise into FFN activations across multiple layers to simulate unseen contexts during editing, building on ROME/MEMIT frameworks. Empirical results on GPT2-xl, GPT-J, and LLaMA-2 across zsRE and Counterfacts show that DNE improves generalization to paraphrased prompts and related contexts, often outperforming NoisyTune and related baselines. The approach offers a practical path to more robust, context-aware knowledge edits with broad implications for interpretability and safe deployment of edited LLMs.
Abstract
Knowledge-editing updates knowledge of large language models (LLMs) and contributes to the interpretability and application of LLMs. However, knowledge applying is context-consistent: LLMs can recall the same knowledge in different contexts. Existing works ignore this property and the editing lacks generalization. In this paper, we empirically find that the effects of different contexts upon LLMs in recalling the same knowledge follow a Gaussian-like distribution. We then sample Gaussian noises to simulate the effects of different contexts when updating LLMs. By such, we can make LLMs see the unseen contexts where the edited knowledge will be applied, therefore improving the editing generalization. Experimental results on three LLMs demonstrate the effectiveness of our methods and also distinguish our methods from the others of fine-tuning LLMs by noises.
