Table of Contents
Fetching ...

On the Necessity of Auditable Algorithmic Definitions for Machine Unlearning

Anvith Thudi, Hengrui Jia, Ilia Shumailov, Nicolas Papernot

TL;DR

It is shown that the definition that underlies approximate unlearning, which seeks to prove the approximately unlearned model is close to an exactly retrained model, is incorrect because one can obtain the same model using different datasets, and therefore one could unlearn without modifying the model at all.

Abstract

Machine unlearning, i.e. having a model forget about some of its training data, has become increasingly more important as privacy legislation promotes variants of the right-to-be-forgotten. In the context of deep learning, approaches for machine unlearning are broadly categorized into two classes: exact unlearning methods, where an entity has formally removed the data point's impact on the model by retraining the model from scratch, and approximate unlearning, where an entity approximates the model parameters one would obtain by exact unlearning to save on compute costs. In this paper, we first show that the definition that underlies approximate unlearning, which seeks to prove the approximately unlearned model is close to an exactly retrained model, is incorrect because one can obtain the same model using different datasets. Thus one could unlearn without modifying the model at all. We then turn to exact unlearning approaches and ask how to verify their claims of unlearning. Our results show that even for a given training trajectory one cannot formally prove the absence of certain data points used during training. We thus conclude that unlearning is only well-defined at the algorithmic level, where an entity's only possible auditable claim to unlearning is that they used a particular algorithm designed to allow for external scrutiny during an audit.

On the Necessity of Auditable Algorithmic Definitions for Machine Unlearning

TL;DR

It is shown that the definition that underlies approximate unlearning, which seeks to prove the approximately unlearned model is close to an exactly retrained model, is incorrect because one can obtain the same model using different datasets, and therefore one could unlearn without modifying the model at all.

Abstract

Machine unlearning, i.e. having a model forget about some of its training data, has become increasingly more important as privacy legislation promotes variants of the right-to-be-forgotten. In the context of deep learning, approaches for machine unlearning are broadly categorized into two classes: exact unlearning methods, where an entity has formally removed the data point's impact on the model by retraining the model from scratch, and approximate unlearning, where an entity approximates the model parameters one would obtain by exact unlearning to save on compute costs. In this paper, we first show that the definition that underlies approximate unlearning, which seeks to prove the approximately unlearned model is close to an exactly retrained model, is incorrect because one can obtain the same model using different datasets. Thus one could unlearn without modifying the model at all. We then turn to exact unlearning approaches and ask how to verify their claims of unlearning. Our results show that even for a given training trajectory one cannot formally prove the absence of certain data points used during training. We thus conclude that unlearning is only well-defined at the algorithmic level, where an entity's only possible auditable claim to unlearning is that they used a particular algorithm designed to allow for external scrutiny during an audit.

Paper Structure

This paper contains 44 sections, 9 theorems, 2 equations, 4 figures, 1 table.

Key Result

Lemma 1

If $D$ and $D'$ are forgeable with $\epsilon = 0$, then $H_D(w) = H_{D'}(w)$.

Figures (4)

  • Figure 1: Verification error as a function of the number of samples in the dataset: one can see the error is always in the order of $10^{-6}$ and there is a slight decrease in the error as the dataset becomes larger. Note here we used a fixed minibatch size of $1000$, and the colors denote different runs.
  • Figure 2: Verification error as a function of the minibatch size: the error drops drastically as the minibatch size increases from $0$ to $1000$. Afterwards the drop becomes more gradual. Note again the colors denote different runs.
  • Figure 3: Relative verification error as a function of the number of greedy updates (selecting individual points rather than substituting for entire minibatches). The curves are generated by repeating the experiment on 100 individual models, starting from a randomly found minimum. Note how we only observe a marginal benefit by increasing the number of updates. Furthermore, not again that different colors denote different runs.
  • Figure 4: Verification error when forging using samples from a smaller dataset, plotted against the epoch where we are forging. One can see the error is still quite low (less than $10^{-4}$). Note again the different colors denote different runs.

Theorems & Definitions (18)

  • Lemma 1
  • proof
  • Lemma 2
  • proof
  • Theorem 1: Characterizing $\epsilon = 0$ Equivalence Classes
  • proof
  • Lemma 3: Similar Forgiability
  • proof
  • Lemma 4
  • proof
  • ...and 8 more