Redirection for Erasing Memory (REM): Towards a universal unlearning method for corrupted data
Stefan Schoepf, Michael Curtis Mozer, Nicole Elyse Mitchell, Alexandra Brintrup, Georgios Kaissis, Peter Kairouz, Eleni Triantafillou
TL;DR
This work introduces a two-dimensional taxonomy for corrupted-data unlearning tasks defined by discovery rate and data regularity, revealing that existing methods fail outside their target regions. It then proposes Redirection for Erasing Memory (REM), a universal unlearning method that post-processes a trained model by expanding capacity with dedicated parameters to redirect corrupted information, which are discarded after unlearning. REM combines a non-retain-set removal step based on Negative Preference Optimization with a masking-based redirection channel to recover utility while healing corrupted data across low, medium, and high regularity and discovery-rate regimes, outperforming prior approaches. The findings demonstrate REM’s robust, model- and dataset-agnostic performance on CIFAR-10 and SVHN with ResNet-9 and ViT, while outlining limitations and avenues for future improvement such as softer masking and extending the framework to other modalities or privacy-focused unlearning.
Abstract
Machine unlearning is studied for a multitude of tasks, but specialization of unlearning methods to particular tasks has made their systematic comparison challenging. To address this issue, we propose a conceptual space to characterize diverse corrupted data unlearning tasks in vision classifiers. This space is described by two dimensions, the discovery rate (the fraction of the corrupted data that are known at unlearning time) and the statistical regularity of the corrupted data (from random exemplars to shared concepts). Methods proposed previously have been targeted at portions of this space and-we show-fail predictably outside these regions. We propose a novel method, Redirection for Erasing Memory (REM), whose key feature is that corrupted data are redirected to dedicated neurons introduced at unlearning time and then discarded or deactivated to suppress the influence of corrupted data. REM performs strongly across the space of tasks, in contrast to prior SOTA methods that fail outside the regions for which they were designed.
