POUR: A Provably Optimal Method for Unlearning Representations via Neural Collapse
Anjie Le, Can Peng, Yuyuan Liu, J. Alison Noble
TL;DR
This work reframes machine unlearning as a representation-level problem guided by Neural Collapse geometry. It introduces a provably optimal projection-based forgetting operator (POUR) with a closed-form variant (POUR-P) and a distillation-based variant (POUR-D); both preserve the simplex ETF structure among retained classes while erasing the forgotten one. The authors establish theoretical guarantees: (i) ETF geometry implies Bayes-optimality for retained classes, and (ii) orthogonal projection preserves this geometry under forgetting, enabling complete forgetting of the target class in the NC limit. Empirically, POUR achieves superior forgetting and retention across CIFAR-10/100 and PathMNIST, outperforming baselines on both classification- and representation-level metrics, and demonstrating robustness under domain shift. The work provides a principled, geometry-driven approach to representation-level unlearning with practical implications for privacy, safety, and deployment of pretrained models.
Abstract
In computer vision, machine unlearning aims to remove the influence of specific visual concepts or training images without retraining from scratch. Studies show that existing approaches often modify the classifier while leaving internal representations intact, resulting in incomplete forgetting. In this work, we extend the notion of unlearning to the representation level, deriving a three-term interplay between forgetting efficacy, retention fidelity, and class separation. Building on Neural Collapse theory, we show that the orthogonal projection of a simplex Equiangular Tight Frame (ETF) remains an ETF in a lower dimensional space, yielding a provably optimal forgetting operator. We further introduce the Representation Unlearning Score (RUS) to quantify representation-level forgetting and retention fidelity. Building on this, we introduce POUR (Provably Optimal Unlearning of Representations), a geometric projection method with closed-form (POUR-P) and a feature-level unlearning variant under a distillation scheme (POUR-D). Experiments on CIFAR-10/100 and PathMNIST demonstrate that POUR achieves effective unlearning while preserving retained knowledge, outperforming state-of-the-art unlearning methods on both classification-level and representation-level metrics.
