Erase at the Core: Representation Unlearning for Machine Unlearning

Jaewon Lee; Yongwoo Kim; Donghyun Kim

Erase at the Core: Representation Unlearning for Machine Unlearning

Jaewon Lee, Yongwoo Kim, Donghyun Kim

TL;DR

This work tackles the problem of superficial forgetting in machine unlearning by showing that reducing forget-set logits often leaves intermediate representations largely unchanged. It introduces Erase at the Core (EC), a multi-layer, representation-focused framework that attaches auxiliary modules to intermediate layers and applies layer-wise contrastive unlearning plus deep supervision to diffuse forgetting through the network depth. EC demonstrates stronger representation-level forgetting than prior methods on ImageNet-1K and CIFAR-100 across architectures, while preserving retain-set performance, and it functions as a model-agnostic plug-in to boost other unlearning baselines. The results highlight the practical importance of enforcing core, depth-spanning forgetting, though formal erasure guarantees and computational overhead considerations remain avenues for future work.

Abstract

Many approximate machine unlearning methods demonstrate strong logit-level forgetting -- such as near-zero accuracy on the forget set -- yet continue to preserve substantial information within their internal feature representations. We refer to this discrepancy as superficial forgetting. Recent studies indicate that most existing unlearning approaches primarily alter the final classifier, leaving intermediate representations largely unchanged and highly similar to those of the original model. To address this limitation, we introduce the Erase at the Core (EC), a framework designed to enforce forgetting throughout the entire network hierarchy. EC integrates multi-layer contrastive unlearning on the forget set with retain set preservation through deeply supervised learning. Concretely, EC attaches auxiliary modules to intermediate layers and applies both contrastive unlearning and cross-entropy losses at each supervision point, with layer-wise weighted losses. Experimental results show that EC not only achieves effective logit-level forgetting, but also substantially reduces representational similarity to the original model across intermediate layers. Furthermore, EC is model-agnostic and can be incorporated as a plug-in module into existing unlearning methods, improving representation-level forgetting while maintaining performance on the retain set.

Erase at the Core: Representation Unlearning for Machine Unlearning

TL;DR

Abstract

Paper Structure (31 sections, 5 equations, 4 figures, 11 tables, 1 algorithm)

This paper contains 31 sections, 5 equations, 4 figures, 11 tables, 1 algorithm.

Introduction
Related Work
Machine Unlearning
Unlearning Evaluation: Logit-based and Representation-based Evaluation
Representation Learning Across Intermediate Layers
Method: Erase at the Core
Preliminaries and Problem Definition
Architectures
Unlearning Objectives
Experiments
Experimental Setup
Experimental Results
Ablation Study
Conclusion
Additional Results
...and 16 more sections

Figures (4)

Figure 1: Illustration of Erase at the Core (EC). EC attaches EC Modules to intermediate layers and applies layer-wise contrastive unlearning loss along with the cross-entropy loss. Here, $L$ denotes the number of layers in the backbone, and the EC module at each Conv Block in ResNet is repeated $(L-k)$ times.
Figure 2: Layer-wise representational similarity to the original model measured by CKA on the test forget set. We compare features from ResNet-50 Layers 4 (=4.2) and two Layer-4 bottleneck blocks (Layer 4.0, 4.1); lower CKA indicates larger deviation from the original model.
Figure 3: t-SNE visualization of the pooled feature representation before the final classifier. Red stars are the forget class, circles are 9 retain classes with the highest similarity to the forget class.
Figure 4: Comparison of k-NN retrieval results for a query image from the forget set under the ImageNet-1K, ResNet-50, random 100 class forgetting setup. All four models retrieve the same class (Rottweiler) at top-1, but the Retrained and EC models return the same retrieved image, while the Original model and DELETE return a different one. DELETE retrieves the same images as the Original model across top-5, differing only in order, indicating minimal representation-level change. From top-2 onward, the EC model's retrievals align more closely with those of the Retrained model than with the Original model or DELETE.

Erase at the Core: Representation Unlearning for Machine Unlearning

TL;DR

Abstract

Erase at the Core: Representation Unlearning for Machine Unlearning

Authors

TL;DR

Abstract

Table of Contents

Figures (4)