Erase at the Core: Representation Unlearning for Machine Unlearning
Jaewon Lee, Yongwoo Kim, Donghyun Kim
TL;DR
This work tackles the problem of superficial forgetting in machine unlearning by showing that reducing forget-set logits often leaves intermediate representations largely unchanged. It introduces Erase at the Core (EC), a multi-layer, representation-focused framework that attaches auxiliary modules to intermediate layers and applies layer-wise contrastive unlearning plus deep supervision to diffuse forgetting through the network depth. EC demonstrates stronger representation-level forgetting than prior methods on ImageNet-1K and CIFAR-100 across architectures, while preserving retain-set performance, and it functions as a model-agnostic plug-in to boost other unlearning baselines. The results highlight the practical importance of enforcing core, depth-spanning forgetting, though formal erasure guarantees and computational overhead considerations remain avenues for future work.
Abstract
Many approximate machine unlearning methods demonstrate strong logit-level forgetting -- such as near-zero accuracy on the forget set -- yet continue to preserve substantial information within their internal feature representations. We refer to this discrepancy as superficial forgetting. Recent studies indicate that most existing unlearning approaches primarily alter the final classifier, leaving intermediate representations largely unchanged and highly similar to those of the original model. To address this limitation, we introduce the Erase at the Core (EC), a framework designed to enforce forgetting throughout the entire network hierarchy. EC integrates multi-layer contrastive unlearning on the forget set with retain set preservation through deeply supervised learning. Concretely, EC attaches auxiliary modules to intermediate layers and applies both contrastive unlearning and cross-entropy losses at each supervision point, with layer-wise weighted losses. Experimental results show that EC not only achieves effective logit-level forgetting, but also substantially reduces representational similarity to the original model across intermediate layers. Furthermore, EC is model-agnostic and can be incorporated as a plug-in module into existing unlearning methods, improving representation-level forgetting while maintaining performance on the retain set.
