Unlearning's Blind Spots: Over-Unlearning and Prototypical Relearning Attack
SeungBum Ha, Saerom Park, Sung Whan Yoon
TL;DR
This work reveals two overlooked risks in class‑level machine unlearning: boundary‑proximal over‑unlearning and post‑unlearning prototypical relearning. It introduces OU@ε as a retainment‑free metric for collateral damage near forget boundaries and the Prototypical Relearning Attack (PRA) that exploits residual forget‑class structure. The authors propose Spotter, a plug‑and‑play objective that combines masked distillation around forget‑adjacent regions with an intra‑class dispersion loss to both reduce OU@ε and neutralize PRA without using retained data. Across CIFAR, TinyImageNet, and CASIA‑WebFace, Spotter achieves near‑complete forgetting while preserving retained utility, demonstrating practical applicability for privacy‑preserving unlearning in diverse architectures including transformers.
Abstract
Machine unlearning (MU) aims to expunge a designated forget set from a trained model without costly retraining, yet the existing techniques overlook two critical blind spots: "over-unlearning" that deteriorates retained data near the forget set, and post-hoc "relearning" attacks that aim to resurrect the forgotten knowledge. Focusing on class-level unlearning, we first derive an over-unlearning metric, OU@epsilon, which quantifies collateral damage in regions proximal to the forget set, where over-unlearning mainly appears. Next, we expose an unforeseen relearning threat on MU, i.e., the Prototypical Relearning Attack, which exploits the per-class prototype of the forget class with just a few samples, and easily restores the pre-unlearning performance. To counter both blind spots in class-level unlearning, we introduce Spotter, a plug-and-play objective that combines (i) a masked knowledge-distillation penalty on the nearby region of forget classes to suppress OU@epsilon, and (ii) an intra-class dispersion loss that scatters forget-class embeddings, neutralizing Prototypical Relearning Attacks. Spotter achieves state-of-the-art results across CIFAR, TinyImageNet, and CASIA-WebFace datasets, offering a practical remedy to unlearning's blind spots.
