SCAN: Sparse Circuit Anchor Interpretable Neuron for Lifelong Knowledge Editing

Yuhuan Liu; Haitian Zhong; Xinyuan Xia; Qiang Liu; Shu Wu; Liang Wang

SCAN: Sparse Circuit Anchor Interpretable Neuron for Lifelong Knowledge Editing

Yuhuan Liu, Haitian Zhong, Xinyuan Xia, Qiang Liu, Shu Wu, Liang Wang

Abstract

Large Language Models (LLMs) often suffer from catastrophic forgetting and collapse during sequential knowledge editing. This vulnerability stems from the prevailing dense editing paradigm, which treats models as black boxes and relies on coarse-grained parameter interventions that inevitably disrupt preserved knowledge. To address this, we propose SCAN (a sparse editing framework based on Sparse Circuit Anchored Neuron) which transforms editing into a mechanism-aware manipulation by constructing a knowledge circuit via Sparse Transcoders. Experiments on Gemma2, Qwen3, and Llama3.1 across CounterFact, ZsRE and WikiFactDiff demonstrate that SCAN achieves a superior performance, maintaining model integrity on benchmarks like MMLU and GSM8K even after 3,000 sequential edits, whereas other existing methods deteriorate progressively as editing accumulates, eventually resulting in model collapse.

SCAN: Sparse Circuit Anchor Interpretable Neuron for Lifelong Knowledge Editing

Abstract

Paper Structure (40 sections, 5 theorems, 41 equations, 7 figures, 3 tables, 1 algorithm)

This paper contains 40 sections, 5 theorems, 41 equations, 7 figures, 3 tables, 1 algorithm.

Introduction
Preliminary
Lifelong Editing and Steer-based Method
Editing Mechanism: MLPs as Key-Value Memory
Sparse Transcoder and Monosemanticity
Edit with Sparse Circuit Anchored Neuron
Attribution Graph Construction
Gradient-Based Node Influence Computation
Prune by Total (one-step and multi-step) Attribution
Two-step Attribution.
Three-step and Total Attribution.
Sparse Edit and Knowledge Inject
Experiments
Experimental Setup
How does our method perform in sequential editing scenarios? (RQ1)
...and 25 more sections

Key Result

Proposition 3.2

Let $X,Y\subset \mathbb{R}^n$ be two spaces and let $f: X \to Y$ be a mapping such that $f(0)=0$ and $f$ is differentiable at every point $x_0\in X$ with Jacobian matrix $J_f(x_0)$ and $J_f$ is continuous and non-singular at $0$. Then, we have This implies that the direction of any vector transformed by $f$ is closely aligned with the direction induced by the Jacobian transformation.

Figures (7)

Figure 1: Comparison of current methods and ours. Current methods (a) modify the entire dense MLP weight matrix. Our approach (b) isolates factual features, editing knowledge-relevant vectors.
Figure 2: Cumulative proportion of selected feature across different token positions. (a) and (b) represent the distribution for Gemma2-2B and Qwen3-8B on CounterFact dataset, respectively.
Figure 3: Distribution of selected feature across layers. Both models exhibit a characteristic dual-peak pattern, indicating functional localization in shallow and middle-to-deep layers.
Figure 4: Heatmap of selected feature distribution across layers at special token position. The dark regions indicate that the early-layer peaks in Figure \ref{['fig:layer_distribution_combined']} align with the subject tokens, while the later-layer peaks correspond to the last token position on both models.
Figure 5: Activation visualization of identified features on the specific prompts. The left column shows Feature #13366 at Layer 19, and the right column shows Feature #410 at Layer 24. Darker colors indicate higher activation values.
...and 2 more figures

Theorems & Definitions (12)

Definition 3.1: Initiation of Attribution Graph
Proposition 3.2: Jacobian as the Optimal Direction-Preserving Linearization
Definition 3.3: Direct (one-step) Attribution Matrix
Theorem 3.4: Full-derivative expansion
Proposition 3.5: Closed-form Total Attribution Matrix
Lemma 2.1: Stability of normalization
proof
proof
Lemma 2.2: Convergence of Powers of $A$
proof
...and 2 more

SCAN: Sparse Circuit Anchor Interpretable Neuron for Lifelong Knowledge Editing

Abstract

SCAN: Sparse Circuit Anchor Interpretable Neuron for Lifelong Knowledge Editing

Authors

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (12)