Table of Contents
Fetching ...

Exploring and Leveraging Class Vectors for Classifier Editing

Jaeik Kim, Jaeyoung Do

TL;DR

This work introduces Class Vectors, per-class latent-space adapters that capture how each class shifts during fine-tuning via $\kappa_c = z^{ft}_{c} - z^{pre}_{c}$. By exploiting CTL and Neural Collapse, the authors show these vectors enable linear, independent, and data-efficient edits either through latent-space steering or weight-space mapping, without requiring full retraining. The method demonstrates strong performance for class unlearning, environment adaptation, defense against typography attacks, and even backdoor-trigger optimization, while remaining lightweight and architecture-agnostic across ViT, CNN, and language models. The approach offers a scalable, interpretable pathway to personalized or domain-specific classifier editing with practical implications for safety, robustness, and efficiency.

Abstract

Image classifiers play a critical role in detecting diseases in medical imaging and identifying anomalies in manufacturing processes. However, their predefined behaviors after extensive training make post hoc model editing difficult, especially when it comes to forgetting specific classes or adapting to distribution shifts. Existing classifier editing methods either focus narrowly on correcting errors or incur extensive retraining costs, creating a bottleneck for flexible editing. Moreover, such editing has seen limited investigation in image classification. To overcome these challenges, we introduce Class Vectors, which capture class-specific representation adjustments during fine-tuning. Whereas task vectors encode task-level changes in weight space, Class Vectors disentangle each class's adaptation in the latent space. We show that Class Vectors capture each class's semantic shift and that classifier editing can be achieved either by steering latent features along these vectors or by mapping them into weight space to update the decision boundaries. We also demonstrate that the inherent linearity and orthogonality of Class Vectors support efficient, flexible, and high-level concept editing via simple class arithmetic. Finally, we validate their utility in applications such as unlearning, environmental adaptation, adversarial defense, and adversarial trigger optimization.

Exploring and Leveraging Class Vectors for Classifier Editing

TL;DR

This work introduces Class Vectors, per-class latent-space adapters that capture how each class shifts during fine-tuning via . By exploiting CTL and Neural Collapse, the authors show these vectors enable linear, independent, and data-efficient edits either through latent-space steering or weight-space mapping, without requiring full retraining. The method demonstrates strong performance for class unlearning, environment adaptation, defense against typography attacks, and even backdoor-trigger optimization, while remaining lightweight and architecture-agnostic across ViT, CNN, and language models. The approach offers a scalable, interpretable pathway to personalized or domain-specific classifier editing with practical implications for safety, robustness, and efficiency.

Abstract

Image classifiers play a critical role in detecting diseases in medical imaging and identifying anomalies in manufacturing processes. However, their predefined behaviors after extensive training make post hoc model editing difficult, especially when it comes to forgetting specific classes or adapting to distribution shifts. Existing classifier editing methods either focus narrowly on correcting errors or incur extensive retraining costs, creating a bottleneck for flexible editing. Moreover, such editing has seen limited investigation in image classification. To overcome these challenges, we introduce Class Vectors, which capture class-specific representation adjustments during fine-tuning. Whereas task vectors encode task-level changes in weight space, Class Vectors disentangle each class's adaptation in the latent space. We show that Class Vectors capture each class's semantic shift and that classifier editing can be achieved either by steering latent features along these vectors or by mapping them into weight space to update the decision boundaries. We also demonstrate that the inherent linearity and orthogonality of Class Vectors support efficient, flexible, and high-level concept editing via simple class arithmetic. Finally, we validate their utility in applications such as unlearning, environmental adaptation, adversarial defense, and adversarial trigger optimization.

Paper Structure

This paper contains 45 sections, 3 theorems, 32 equations, 14 figures, 20 tables, 4 algorithms.

Key Result

Theorem 3.1

Suppose the function $f:\mathbb{R}^p\!\to\!\mathbb{R}$, and two fine‑tuned weights $\theta_i ~\text{and}~\theta_j$ satisfy CTL zhou2024emergence. Let $\theta_{\mathrm{pre}}$ be the pre‑trained weights. Define If $\|\theta_i-\theta_{\mathrm{pre}}\| <\|\theta_i-\theta_j\|$, then $\delta_{\mathrm{pre},i} < \delta_{i,j}$: the segment from $\theta_{\mathrm{pre}}$ to $\theta_i$ shows strictly smaller C

Figures (14)

  • Figure 1: Class Vector and its applications. (a) Class Vector captures centroid representation adaptation in the latent space. (b) Editing vector with high-level concepts using arithmetic operations on Class Vectors. (c) Editing vectors can undo predictive behaviors by reversing the adaptation direction, or transition the classifier logic to correct errors.
  • Figure 2: (a) Line-search between $z^c_\text{pre}$ and $z^c_\text{ft}$ explores linearly evolving representation. (b) Linear interpolation between cross-Class Vectors with ViT-B/32 shows smooth transition between classes.
  • Figure 3: Independence of Class Vectors in MNIST. (a) Scaling the target class representation using $z_\text{edit} = \alpha \cdot \kappa_{c_1}$ (b) Adding non-target Class Vectors to the target class based on the combination count. (c) Modifying the target class to each destination class ($z_\text{edit} = \kappa_{\text{des.}}-\kappa_\text{tar.}$), with the averaged task accuracy. (d) Shifting all representations from $c_i \to c_{i+1}$ simultaneously with transition success rate.
  • Figure 3: Results on backdoor attacks with optimized triggers.
  • Figure 4: Adapting the classifier to a snowy environment. Red text marks the misclassifications made by the original model, while blue text shows the correct predictions after classifier editing.
  • ...and 9 more figures

Theorems & Definitions (7)

  • Definition 3.1: Class Vector
  • Theorem 3.1: CTL between pretrained and fine-tuned weights
  • Theorem 3.2: Existence of a Mapping
  • Theorem 3.3: Independence of Class Vectors
  • proof
  • proof
  • proof