Table of Contents
Fetching ...

CrispEdit: Low-Curvature Projections for Scalable Non-Destructive LLM Editing

Zarif Ikram, Arad Firouzkouhi, Stephen Tu, Mahdi Soltanolkotabi, Paria Rashidinejad

TL;DR

CrispEdit introduces a curvature-aware framework for non-destructive LLM editing by formulating editing as a quadratically constrained problem that preserves capabilities. It uses a Bregman-divergence-based Gauss-Newton surrogate to quantify capability loss and performs low-curvature updates projected in a $\,\gamma$-approximate nullspace, implemented efficiently with K-FAC and a matrix-free projector. The method attains strong edit reliability on large models while keeping base capabilities near intact, outperforming prior editors in both batch and sequential editing scenarios. The approach scales to billion-parameter models and provides a practical tool for targeted knowledge updates, safety refinements, and personalization with modest computational overhead.

Abstract

A central challenge in large language model (LLM) editing is capability preservation: methods that successfully change targeted behavior can quietly game the editing proxy and corrupt general capabilities, producing degenerate behaviors reminiscent of proxy/reward hacking. We present CrispEdit, a scalable and principled second-order editing algorithm that treats capability preservation as an explicit constraint, unifying and generalizing several existing editing approaches. CrispEdit formulates editing as constrained optimization and enforces the constraint by projecting edit updates onto the low-curvature subspace of the capability-loss landscape. At the crux of CrispEdit is expressing capability constraint via Bregman divergence, whose quadratic form yields the Gauss-Newton Hessian exactly and even when the base model is not trained to convergence. We make this second-order procedure efficient at the LLM scale using Kronecker-factored approximate curvature (K-FAC) and a novel matrix-free projector that exploits Kronecker structure to avoid constructing massive projection matrices. Across standard model-editing benchmarks, CrispEdit achieves high edit success while keeping capability degradation below 1% on average across datasets, significantly improving over prior editors.

CrispEdit: Low-Curvature Projections for Scalable Non-Destructive LLM Editing

TL;DR

CrispEdit introduces a curvature-aware framework for non-destructive LLM editing by formulating editing as a quadratically constrained problem that preserves capabilities. It uses a Bregman-divergence-based Gauss-Newton surrogate to quantify capability loss and performs low-curvature updates projected in a -approximate nullspace, implemented efficiently with K-FAC and a matrix-free projector. The method attains strong edit reliability on large models while keeping base capabilities near intact, outperforming prior editors in both batch and sequential editing scenarios. The approach scales to billion-parameter models and provides a practical tool for targeted knowledge updates, safety refinements, and personalization with modest computational overhead.

Abstract

A central challenge in large language model (LLM) editing is capability preservation: methods that successfully change targeted behavior can quietly game the editing proxy and corrupt general capabilities, producing degenerate behaviors reminiscent of proxy/reward hacking. We present CrispEdit, a scalable and principled second-order editing algorithm that treats capability preservation as an explicit constraint, unifying and generalizing several existing editing approaches. CrispEdit formulates editing as constrained optimization and enforces the constraint by projecting edit updates onto the low-curvature subspace of the capability-loss landscape. At the crux of CrispEdit is expressing capability constraint via Bregman divergence, whose quadratic form yields the Gauss-Newton Hessian exactly and even when the base model is not trained to convergence. We make this second-order procedure efficient at the LLM scale using Kronecker-factored approximate curvature (K-FAC) and a novel matrix-free projector that exploits Kronecker structure to avoid constructing massive projection matrices. Across standard model-editing benchmarks, CrispEdit achieves high edit success while keeping capability degradation below 1% on average across datasets, significantly improving over prior editors.
Paper Structure (21 sections, 3 theorems, 33 equations, 8 figures, 8 tables, 2 algorithms)

This paper contains 21 sections, 3 theorems, 33 equations, 8 figures, 8 tables, 2 algorithms.

Key Result

Proposition 1

Fix an MLP layer $l$ and consider updating only the weights of layer $l$. Let $\bm{K}^l_{\text{cap}} \coloneqq \bm{I} \otimes [\bm{a}_{l-1}^1, \dots, \bm{a}_{l-1}^n]$ be the layer-input activations on the capability dataset, and $\bm{G}^l_{\text{cap}}$ be the GNH. Then, $\mathsf{Null}( \bm{K}^l_{\te

Figures (8)

  • Figure 1: Comparison overview of CrispEdit. CrispEdit achieves strong edit reliability and generality, with and without QA context, while preserving broad base capabilities (MMLU, GSM8K, IFEval, ARC-C, TruthfulQA) on LLaMA-3-8B-Instruct.
  • Figure 2: Geometric interpretation of CrispEdit compared to baseline editing strategies. Top left: Standard gradient descent effectively minimizes edit loss but moves perpendicular to the capability contours, resulting in high capability loss (degradation). Top right: Projecting onto the nullspace of activation covariance is overly conservative; it preserves representations but restricts the update too heavily to successfully optimize the edit loss. Bottom:CrispEdit projects the update onto the low-curvature subspace of the capability loss. This allows changes in representations to satisfy the edit while moving along the "valley" of the landscape to maintain general model capabilities.
  • Figure 3: Tradeoff between pre-training accuracy (capability preservation) and post-training performance (edit efficacy) for different nullspace projection methods. We fine-tune a LeNet-5 model pre-trained on MNIST on Fashion-MNIST in the $\gamma$-approximate nullspace of the embeddings (Adam-NSCL) Hessian along with Hessian approximations Gauss-Newton Hessian, K-FAC, and EK-FAC (CrispEdit), over a range of energy thresholds $\gamma$.
  • Figure 4: Runtime comparison of CrispEdit with other methods. We apply a number of model editing methods to edit LLaMA-3-8B-Instruct on 3,000 ZsRE samples and measure the wall-clock time for execution.
  • Figure 5: Effect of capability dataset size $n$ on editing performance and base capability preservation. We edit LLaMA-3-8B-Instruct on 3,000 ZsRE samples using CrispEdit for a range of $n$ and measure the editing performance and base capability preservation.
  • ...and 3 more figures

Theorems & Definitions (6)

  • Definition 1: Bregman divergence
  • Proposition 1: AlphaEdit is more conservative
  • Proposition 2: Quadratic Approximation of Bregman Divergence
  • proof
  • Proposition 3
  • proof