Table of Contents
Fetching ...

Unrolling SGD: Understanding Factors Influencing Machine Unlearning

Anvith Thudi, Gabriel Deza, Varun Chandrasekaran, Nicolas Papernot

TL;DR

This paper examines machine unlearning and compares retraining with approximate approaches, introducing verification error as a unifying metric that encapsulates many unlearning criteria. By expanding SGD via a Taylor-series analysis, it derives a computable unlearning error and proposes single gradient unlearning as a lightweight update mechanism that targets the dominant terms tied to the unlearned datapoint. To further minimize unlearning error, the authors introduce the Standard Deviation (SD) loss, which limits weight changes during training and preserves singular-value structure, enabling more effective future unlearning with lower verification error. Empirically, SD loss reduces unlearning and verification errors across CIFAR-10/100 and IMDb, with strong cross-domain correlations and modest accuracy costs, suggesting practical pathways for scalable, certifiable data deletion in deep learning. The work also analyzes trade-offs with privacy metrics like PRS and benchmarks costs against SISA, highlighting the potential and limitations of approximate unlearning in real-world deployments.

Abstract

Machine unlearning is the process through which a deployed machine learning model is made to forget about some of its training data points. While naively retraining the model from scratch is an option, it is almost always associated with large computational overheads for deep learning models. Thus, several approaches to approximately unlearn have been proposed along with corresponding metrics that formalize what it means for a model to forget about a data point. In this work, we first taxonomize approaches and metrics of approximate unlearning. As a result, we identify verification error, i.e., the L2 difference between the weights of an approximately unlearned and a naively retrained model, as an approximate unlearning metric that should be optimized for as it subsumes a large class of other metrics. We theoretically analyze the canonical training algorithm, stochastic gradient descent (SGD), to surface the variables which are relevant to reducing the verification error of approximate unlearning for SGD. From this analysis, we first derive an easy-to-compute proxy for verification error (termed unlearning error). The analysis also informs the design of a new training objective penalty that limits the overall change in weights during SGD and as a result facilitates approximate unlearning with lower verification error. We validate our theoretical work through an empirical evaluation on learning with CIFAR-10, CIFAR-100, and IMDB sentiment analysis.

Unrolling SGD: Understanding Factors Influencing Machine Unlearning

TL;DR

This paper examines machine unlearning and compares retraining with approximate approaches, introducing verification error as a unifying metric that encapsulates many unlearning criteria. By expanding SGD via a Taylor-series analysis, it derives a computable unlearning error and proposes single gradient unlearning as a lightweight update mechanism that targets the dominant terms tied to the unlearned datapoint. To further minimize unlearning error, the authors introduce the Standard Deviation (SD) loss, which limits weight changes during training and preserves singular-value structure, enabling more effective future unlearning with lower verification error. Empirically, SD loss reduces unlearning and verification errors across CIFAR-10/100 and IMDb, with strong cross-domain correlations and modest accuracy costs, suggesting practical pathways for scalable, certifiable data deletion in deep learning. The work also analyzes trade-offs with privacy metrics like PRS and benchmarks costs against SISA, highlighting the potential and limitations of approximate unlearning in real-world deployments.

Abstract

Machine unlearning is the process through which a deployed machine learning model is made to forget about some of its training data points. While naively retraining the model from scratch is an option, it is almost always associated with large computational overheads for deep learning models. Thus, several approaches to approximately unlearn have been proposed along with corresponding metrics that formalize what it means for a model to forget about a data point. In this work, we first taxonomize approaches and metrics of approximate unlearning. As a result, we identify verification error, i.e., the L2 difference between the weights of an approximately unlearned and a naively retrained model, as an approximate unlearning metric that should be optimized for as it subsumes a large class of other metrics. We theoretically analyze the canonical training algorithm, stochastic gradient descent (SGD), to surface the variables which are relevant to reducing the verification error of approximate unlearning for SGD. From this analysis, we first derive an easy-to-compute proxy for verification error (termed unlearning error). The analysis also informs the design of a new training objective penalty that limits the overall change in weights during SGD and as a result facilitates approximate unlearning with lower verification error. We validate our theoretical work through an empirical evaluation on learning with CIFAR-10, CIFAR-100, and IMDB sentiment analysis.

Paper Structure

This paper contains 29 sections, 2 theorems, 22 equations, 16 figures, 4 tables.

Key Result

Lemma 1

If every $\mathbb{P}_I$ is Lipschitz with Lipschitz constant $L$ (which is true if $\mathbf{g}$ is gaussian), and if we let $d = \frac{1}{n!^m}\sum_{I} ||\mathbf{d}_I||_{2}$, then:

Figures (16)

  • Figure 1: Taxonomy of prior work on post-hoc (post-training) approximate (avoiding retraining) unlearning methods. Unlearning methods are categorized in two ways: (1) What is the definition of unlearning used to motivate the removal of information (horizontal axis)? (2) How is the success of the unlearning method measured (vertical axis)?
  • Figure 2: Correlation between privacy risk score and verification error in different training setups for CIFAR-10 and CIFAR-100. The correlations are -0.29 and -0.02, respectively.
  • Figure 3: Unlearning error for 4 different settings as a function of the number of finetuning steps with no pretraining. Across all 4 settings, the unlearning error increases as a function of $t$. Each model was trained for $t = 7812$ steps (5 epochs)
  • Figure 4: Unlearning error of 4 setups trained with increasing strength of $\ell_2$ regularization. As shown, weight decay does not decrease the unlearning error consistently.
  • Figure 5: Plots of the negative gradients of SD loss with respect to the two outputs ($a = out 1$, $b= out 2$). The black line represents $a=b$. Observe how the minimum of the loss landscape (where the arrows switch direction) approaches the black line.
  • ...and 11 more figures

Theorems & Definitions (2)

  • Lemma 1
  • Corollary 1