Table of Contents
Fetching ...

Beyond the Covariance Trap: Unlocking Generalization in Same-Subject Knowledge Editing for Large Language Models

Xiyu Liu, Qingyi Si, Zhengxiao Liu, Chenxu Yang, Naibin Gu, Zheng Lin

Abstract

While locate-then-edit knowledge editing efficiently updates knowledge encoded within Large Language Models (LLMs), a critical generalization failure mode emerges in the practical same-subject knowledge editing scenario: models fail to recall the updated knowledge when following user instructions, despite successfully recalling it in the original edited form. This paper identifies the geometric root of this generalization collapse as a fundamental conflict where the inner activation drifts induced by prompt variations exceed the model's geometric tolerance for generalization after editing. We attribute this instability to a dual pathology: (1) The joint optimization with orthogonal gradients collapses solutions into sharp minima with narrow stability, and (2) the standard covariance constraint paradoxically acts as a Covariance Trap that amplifies input perturbations. To resolve this, we introduce RoSE (Robust Same-subject Editing), which employs Isotropic Geometric Alignment to minimize representational deviation and Hierarchical Knowledge Integration to smooth the optimization landscape. Extensive experiments demonstrate that RoSE significantly improves instruction-following capabilities, laying the foundation for robust interactive parametric memory of LLM agents.

Beyond the Covariance Trap: Unlocking Generalization in Same-Subject Knowledge Editing for Large Language Models

Abstract

While locate-then-edit knowledge editing efficiently updates knowledge encoded within Large Language Models (LLMs), a critical generalization failure mode emerges in the practical same-subject knowledge editing scenario: models fail to recall the updated knowledge when following user instructions, despite successfully recalling it in the original edited form. This paper identifies the geometric root of this generalization collapse as a fundamental conflict where the inner activation drifts induced by prompt variations exceed the model's geometric tolerance for generalization after editing. We attribute this instability to a dual pathology: (1) The joint optimization with orthogonal gradients collapses solutions into sharp minima with narrow stability, and (2) the standard covariance constraint paradoxically acts as a Covariance Trap that amplifies input perturbations. To resolve this, we introduce RoSE (Robust Same-subject Editing), which employs Isotropic Geometric Alignment to minimize representational deviation and Hierarchical Knowledge Integration to smooth the optimization landscape. Extensive experiments demonstrate that RoSE significantly improves instruction-following capabilities, laying the foundation for robust interactive parametric memory of LLM agents.
Paper Structure (66 sections, 23 equations, 20 figures, 6 tables, 1 algorithm)

This paper contains 66 sections, 23 equations, 20 figures, 6 tables, 1 algorithm.

Figures (20)

  • Figure 1: Our work reveals that the current same-subject knowledge editing method MEMIT-Merge fails to generalize to instructed queries because Activation Deviation exceeds the edited model's Tolerance Radius ($D > R$) in the activation space. We unlock robustness by reshaping the geometry towards the safe condition $D \le R$.
  • Figure 2: Geometric Pathology: $D > R$. (a) Orthogonal gradients in joint same-subject editing cause tolerance radius $R$ to collapse. (b) The covariance matrix $C$ serves as an amplification trap, leading to a deviation $D$ of approximately 26.1 beyond $R$. The area below the green line (average $R$) is the safe zone. Replacing $C$ with identity matrix can suppress $D$ to around $17.4$.
  • Figure 3: Distribution of gradient conflict scores for pairs of edits concerning the same subject but different relations. The conflict scores of near 1 demonstrate that their update gradients are near-orthogonal.
  • Figure 4: Average pairwise cosine similarity of $k$. The high similarity in the diagonal blocks (S1 vs. S1 Q-Form, S1 vs. S1 Instruction-Form) shows that subject representation is stable across prompt formats. The near-zero similarity in the off-diagonal blocks (S1 vs. S2) reveals that keys of distinct subjects are orthogonal.
  • Figure 5: The RoSE framework. We expand the tolerance radius $R$ via Hierarchical Knowledge Integration (HKI) and shrink the activation deviation $D$ via Isotropic Geometric Alignment (IGA) towards the ideal condition $D \le R$.
  • ...and 15 more figures