Unlearning via Sparse Representations
Vedant Shah, Frederik Träuble, Ashish Malik, Hugo Larochelle, Michael Mozer, Sanjeev Arora, Yoshua Bengio, Anirudh Goyal
TL;DR
The paper tackles the unlearning problem under practical compute constraints by introducing a Discrete Key-Value Bottleneck (DKVB) that yields sparse, localized representations. It proposes two zero-shot unlearning methods—Unlearning via Activations and Unlearning via Examples—that remove information about a forget class by masking selected key–value pairs, without retraining. Across CIFAR-10, CIFAR-100, LACUNA-100, and ImageNet-1k, and using backbones like CLIP ViT-B/32 and ResNet-50, the approach achieves complete forget-class unlearning while preserving retain-class performance and exhibits substantial FLOPs reductions compared to SCRUB. The results demonstrate that in-built sparsity assists robust, compute-efficient unlearning with practical applicability to large-scale models, while also outlining limitations and avenues for future work in end-to-end sparse training and selective unlearning scenarios.
Abstract
Machine \emph{unlearning}, which involves erasing knowledge about a \emph{forget set} from a trained model, can prove to be costly and infeasible by existing techniques. We propose a nearly compute-free zero-shot unlearning technique based on a discrete representational bottleneck. We show that the proposed technique efficiently unlearns the forget set and incurs negligible damage to the model's performance on the rest of the data set. We evaluate the proposed technique on the problem of \textit{class unlearning} using three datasets: CIFAR-10, CIFAR-100, and LACUNA-100. We compare the proposed technique to SCRUB, a state-of-the-art approach which uses knowledge distillation for unlearning. Across all three datasets, the proposed technique performs as well as, if not better than SCRUB while incurring almost no computational cost.
