Keys to Robust Edits: from Theoretical Insights to Practical Advances
Jianhao Yan, Futing Wang, Yun Luo, Yafu Li, Yue Zhang
TL;DR
This paper addresses the robustness gap in knowledge editing for large language models by showing that native internal representations used as semantic keys are unstable under perturbations. It introduces Robust Edit Pathway (REP), a plug-and-play adapter with a contrastive projection and a token-level gate that disassociates editing keys from native representations and optimizes whitened similarity to balance robustness and specificity. Theoretical analysis provides error-bound conditions for effective editing, and empirical results across multiple editors, models, and datasets demonstrate substantial robustness gains (up to 66.4% absolute) with controlled impact on locality and fluency. The work offers a practical, generalizable approach to reliable edits in LLMs and provides code to facilitate adoption, with implications for safer, long-context knowledge updates in real-world applications.
Abstract
Large language models (LLMs) struggle with maintaining accurate knowledge due to conflicting/outdated parametric memories. While locate-and-edit methods address this, their reliance on models' internal representations leads to robustness failures in long-context reasoning and paraphrased queries. We identify a fundamental limitation of locate-and-edit methods: existing semantic keys (for memory localization) cannot simultaneously satisfy robustness (context-invariant activation) and specificity (precise knowledge discrimination). Through theoretical error-bound analysis, we establish formal criteria for effective editing. Our solution introduces \textit{Robust Edit Pathway (REP)}, a plug-and-play module that: (1) disentangles editing keys from native model representations; (2) dynamically adjusts keys via contrastive learning to achieve robustness-specificity balance. Extensive experiments across various editing methods (ROME/MEMIT/R-ROME/EMMET), existing LLMs (LLaMA2, QWen, Mistral), and datasets (CounterFact, ZsRE) show that REP improves success rate over robustness tests by up-to 66.4\% while maintaining the success rate unaffected. Our code can be found at https://github.com/ElliottYan/RobustKeyEdit .
