Table of Contents
Fetching ...

Keys to Robust Edits: from Theoretical Insights to Practical Advances

Jianhao Yan, Futing Wang, Yun Luo, Yafu Li, Yue Zhang

TL;DR

This paper addresses the robustness gap in knowledge editing for large language models by showing that native internal representations used as semantic keys are unstable under perturbations. It introduces Robust Edit Pathway (REP), a plug-and-play adapter with a contrastive projection and a token-level gate that disassociates editing keys from native representations and optimizes whitened similarity to balance robustness and specificity. Theoretical analysis provides error-bound conditions for effective editing, and empirical results across multiple editors, models, and datasets demonstrate substantial robustness gains (up to 66.4% absolute) with controlled impact on locality and fluency. The work offers a practical, generalizable approach to reliable edits in LLMs and provides code to facilitate adoption, with implications for safer, long-context knowledge updates in real-world applications.

Abstract

Large language models (LLMs) struggle with maintaining accurate knowledge due to conflicting/outdated parametric memories. While locate-and-edit methods address this, their reliance on models' internal representations leads to robustness failures in long-context reasoning and paraphrased queries. We identify a fundamental limitation of locate-and-edit methods: existing semantic keys (for memory localization) cannot simultaneously satisfy robustness (context-invariant activation) and specificity (precise knowledge discrimination). Through theoretical error-bound analysis, we establish formal criteria for effective editing. Our solution introduces \textit{Robust Edit Pathway (REP)}, a plug-and-play module that: (1) disentangles editing keys from native model representations; (2) dynamically adjusts keys via contrastive learning to achieve robustness-specificity balance. Extensive experiments across various editing methods (ROME/MEMIT/R-ROME/EMMET), existing LLMs (LLaMA2, QWen, Mistral), and datasets (CounterFact, ZsRE) show that REP improves success rate over robustness tests by up-to 66.4\% while maintaining the success rate unaffected. Our code can be found at https://github.com/ElliottYan/RobustKeyEdit .

Keys to Robust Edits: from Theoretical Insights to Practical Advances

TL;DR

This paper addresses the robustness gap in knowledge editing for large language models by showing that native internal representations used as semantic keys are unstable under perturbations. It introduces Robust Edit Pathway (REP), a plug-and-play adapter with a contrastive projection and a token-level gate that disassociates editing keys from native representations and optimizes whitened similarity to balance robustness and specificity. Theoretical analysis provides error-bound conditions for effective editing, and empirical results across multiple editors, models, and datasets demonstrate substantial robustness gains (up to 66.4% absolute) with controlled impact on locality and fluency. The work offers a practical, generalizable approach to reliable edits in LLMs and provides code to facilitate adoption, with implications for safer, long-context knowledge updates in real-world applications.

Abstract

Large language models (LLMs) struggle with maintaining accurate knowledge due to conflicting/outdated parametric memories. While locate-and-edit methods address this, their reliance on models' internal representations leads to robustness failures in long-context reasoning and paraphrased queries. We identify a fundamental limitation of locate-and-edit methods: existing semantic keys (for memory localization) cannot simultaneously satisfy robustness (context-invariant activation) and specificity (precise knowledge discrimination). Through theoretical error-bound analysis, we establish formal criteria for effective editing. Our solution introduces \textit{Robust Edit Pathway (REP)}, a plug-and-play module that: (1) disentangles editing keys from native model representations; (2) dynamically adjusts keys via contrastive learning to achieve robustness-specificity balance. Extensive experiments across various editing methods (ROME/MEMIT/R-ROME/EMMET), existing LLMs (LLaMA2, QWen, Mistral), and datasets (CounterFact, ZsRE) show that REP improves success rate over robustness tests by up-to 66.4\% while maintaining the success rate unaffected. Our code can be found at https://github.com/ElliottYan/RobustKeyEdit .

Paper Structure

This paper contains 35 sections, 6 theorems, 24 equations, 8 figures, 3 tables.

Key Result

Lemma 4.1

Given $K\in\mathbb{R}^{D_1 \times n}$ and $V\in\mathbb{R}^{D_2 \times n}$ as defined in Definition def:kv that are already stored in the feed-forward layer $W\in\mathbb{R}^{D_2\times D_1}$, assume $n \gg D_1$ and $K$ has the rank of $D_1$. When a new query $k_*$ comes, its corresponding value can be

Figures (8)

  • Figure 1: An example of the edited knowledge 'Slovenia belongs to the continent of' through knowledge editing and its failures on the different scenarios.
  • Figure 2: Overview of REP. Left: Key concept visualization; Right: Architectural design of the adapter.
  • Figure 3: The distribution of normalized whitening similarity between different kinds of keys and original keys.
  • Figure 4: Left: CounterFact subjects have unrelated prefixes which are close in keys. The red dashed line indicates random keys baseline. Right: Semantically similar subjects bring challenges to specificity.
  • Figure 5: Hyper-parameter study of $\tau$ on validation set.
  • ...and 3 more figures

Theorems & Definitions (14)

  • Definition 3.1: Knowledge Editing for LLMs
  • Definition 3.2: MLP Layers as Associative Memories
  • Definition 3.3: The Solution of ROME
  • Remark 3.4: Extract $k_*$
  • Remark 3.5: Calculate $v_*$
  • Lemma 4.1: Fuzzy Key-Value Mapping
  • Corollary 4.2: Edited Key-Value as a Patch against Original Knowledge
  • Remark 4.3
  • Lemma 4.4: Bound on optimized $\Delta v=v_* - v_o$
  • Remark 4.5
  • ...and 4 more