Keys to Robust Edits: from Theoretical Insights to Practical Advances

Jianhao Yan; Futing Wang; Yun Luo; Yafu Li; Yue Zhang

Keys to Robust Edits: from Theoretical Insights to Practical Advances

Jianhao Yan, Futing Wang, Yun Luo, Yafu Li, Yue Zhang

TL;DR

This paper addresses the robustness gap in knowledge editing for large language models by showing that native internal representations used as semantic keys are unstable under perturbations. It introduces Robust Edit Pathway (REP), a plug-and-play adapter with a contrastive projection and a token-level gate that disassociates editing keys from native representations and optimizes whitened similarity to balance robustness and specificity. Theoretical analysis provides error-bound conditions for effective editing, and empirical results across multiple editors, models, and datasets demonstrate substantial robustness gains (up to 66.4% absolute) with controlled impact on locality and fluency. The work offers a practical, generalizable approach to reliable edits in LLMs and provides code to facilitate adoption, with implications for safer, long-context knowledge updates in real-world applications.

Abstract

Large language models (LLMs) struggle with maintaining accurate knowledge due to conflicting/outdated parametric memories. While locate-and-edit methods address this, their reliance on models' internal representations leads to robustness failures in long-context reasoning and paraphrased queries. We identify a fundamental limitation of locate-and-edit methods: existing semantic keys (for memory localization) cannot simultaneously satisfy robustness (context-invariant activation) and specificity (precise knowledge discrimination). Through theoretical error-bound analysis, we establish formal criteria for effective editing. Our solution introduces \textit{Robust Edit Pathway (REP)}, a plug-and-play module that: (1) disentangles editing keys from native model representations; (2) dynamically adjusts keys via contrastive learning to achieve robustness-specificity balance. Extensive experiments across various editing methods (ROME/MEMIT/R-ROME/EMMET), existing LLMs (LLaMA2, QWen, Mistral), and datasets (CounterFact, ZsRE) show that REP improves success rate over robustness tests by up-to 66.4\% while maintaining the success rate unaffected. Our code can be found at https://github.com/ElliottYan/RobustKeyEdit .

Keys to Robust Edits: from Theoretical Insights to Practical Advances

TL;DR

Abstract

Keys to Robust Edits: from Theoretical Insights to Practical Advances

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (14)