Table of Contents
Fetching ...

Redirection for Erasing Memory (REM): Towards a universal unlearning method for corrupted data

Stefan Schoepf, Michael Curtis Mozer, Nicole Elyse Mitchell, Alexandra Brintrup, Georgios Kaissis, Peter Kairouz, Eleni Triantafillou

TL;DR

This work introduces a two-dimensional taxonomy for corrupted-data unlearning tasks defined by discovery rate and data regularity, revealing that existing methods fail outside their target regions. It then proposes Redirection for Erasing Memory (REM), a universal unlearning method that post-processes a trained model by expanding capacity with dedicated parameters to redirect corrupted information, which are discarded after unlearning. REM combines a non-retain-set removal step based on Negative Preference Optimization with a masking-based redirection channel to recover utility while healing corrupted data across low, medium, and high regularity and discovery-rate regimes, outperforming prior approaches. The findings demonstrate REM’s robust, model- and dataset-agnostic performance on CIFAR-10 and SVHN with ResNet-9 and ViT, while outlining limitations and avenues for future improvement such as softer masking and extending the framework to other modalities or privacy-focused unlearning.

Abstract

Machine unlearning is studied for a multitude of tasks, but specialization of unlearning methods to particular tasks has made their systematic comparison challenging. To address this issue, we propose a conceptual space to characterize diverse corrupted data unlearning tasks in vision classifiers. This space is described by two dimensions, the discovery rate (the fraction of the corrupted data that are known at unlearning time) and the statistical regularity of the corrupted data (from random exemplars to shared concepts). Methods proposed previously have been targeted at portions of this space and-we show-fail predictably outside these regions. We propose a novel method, Redirection for Erasing Memory (REM), whose key feature is that corrupted data are redirected to dedicated neurons introduced at unlearning time and then discarded or deactivated to suppress the influence of corrupted data. REM performs strongly across the space of tasks, in contrast to prior SOTA methods that fail outside the regions for which they were designed.

Redirection for Erasing Memory (REM): Towards a universal unlearning method for corrupted data

TL;DR

This work introduces a two-dimensional taxonomy for corrupted-data unlearning tasks defined by discovery rate and data regularity, revealing that existing methods fail outside their target regions. It then proposes Redirection for Erasing Memory (REM), a universal unlearning method that post-processes a trained model by expanding capacity with dedicated parameters to redirect corrupted information, which are discarded after unlearning. REM combines a non-retain-set removal step based on Negative Preference Optimization with a masking-based redirection channel to recover utility while healing corrupted data across low, medium, and high regularity and discovery-rate regimes, outperforming prior approaches. The findings demonstrate REM’s robust, model- and dataset-agnostic performance on CIFAR-10 and SVHN with ResNet-9 and ViT, while outlining limitations and avenues for future improvement such as softer masking and extending the framework to other modalities or privacy-focused unlearning.

Abstract

Machine unlearning is studied for a multitude of tasks, but specialization of unlearning methods to particular tasks has made their systematic comparison challenging. To address this issue, we propose a conceptual space to characterize diverse corrupted data unlearning tasks in vision classifiers. This space is described by two dimensions, the discovery rate (the fraction of the corrupted data that are known at unlearning time) and the statistical regularity of the corrupted data (from random exemplars to shared concepts). Methods proposed previously have been targeted at portions of this space and-we show-fail predictably outside these regions. We propose a novel method, Redirection for Erasing Memory (REM), whose key feature is that corrupted data are redirected to dedicated neurons introduced at unlearning time and then discarded or deactivated to suppress the influence of corrupted data. REM performs strongly across the space of tasks, in contrast to prior SOTA methods that fail outside the regions for which they were designed.

Paper Structure

This paper contains 14 sections, 2 equations, 10 figures, 5 tables, 1 algorithm.

Figures (10)

  • Figure 1: We present a new taxonomy of unlearning tasks in terms of two dimensions: the regularity and the discovery rate of the corrupted data we wish to unlearn. The highlighted areas in (a) show tasks studied in prior work. Subplots (b-e) show an aggregate metric of unlearning performance (see Section \ref{['sec:background']}) of different methods for different discovery rates >0% (x-axis) and regularities (y-axis), instantiated via the benchmarks of goelcorrective. Better performance is shown in darker color. Prior methods succeed in only slices of this 2D space, mainly failing along the regularity axis.
  • Figure 2: REM performs the following steps: (i) Expand the network with randomly-initialized parameters $\theta_{o_2}$; (ii) Remove the corruptions out of $\theta_{o_1}$ with a SOTA unlearning algorithm on $\theta_{o_1}$ that does not use $\mathcal{D}_{\text{r}}$, avoiding reintroduction in $\theta_{o_1}$, but at the expense of utility; (iii) Repair utility by fine-tuning $\theta_{o_1}$ with $\mathcal{D}_{\text{tr}}$, using a novel Redirection strategy to steer any reintroduction of corruptions caused by the inclusion of $\mathcal{D}_{\text{r}}$ to the add-on parameters $\theta_{o_2}$; (iv) Drop out $\theta_{o_2}$.
  • Figure 3: Comparison of UL methods on two model capacity levels using ResNet-9 & CIFAR10 with 1000 corrupted samples, three regularity levels and 10 discovery rates (10%-100%). REM (IDEAL) provides and upper limit with perfect knowledge of manipulated samples for mask assignment. ETD and REM are not reported for 100% capacity as reserve capacity is needed for $\theta_{o_2}$ of the model. Error bars reflect $\pm1$ SEM.
  • Figure 4: Comparison of REM applied to a model trained with/without ETD. The performance of the model before applying REM is shown in the 0.0 discovery column. (b) shows that ETD provides an uplift in lower discovery rates for lower regularity tasks (y-axis) but comes at the cost of overall model utility (see Tab. \ref{['tab:all50and100']}) which harms the higher regularity and higher discovery rate performance.
  • Figure 5: We report healing accuracy on the train and test set for all ResNet-9 experiments in the paper for all 10 discovery rates for each unlearning task (three regularity levels) and all UL methods. Results show that healing corrupted data is highly correlated between train and test data for corruptions with high regularity while showing little correlation for low regularity manipulations such as Rand. Label Swap. Models that were destroyed during unlearning (model utility below 80%) were removed.
  • ...and 5 more figures