Table of Contents
Fetching ...

Rewind-to-Delete: Certified Machine Unlearning for Nonconvex Functions

Siqiao Mu, Diego Klabjan

TL;DR

This paper introduces rewind-to-delete (R2D), a first-order, black-box certified unlearning algorithm for general nonconvex loss functions. It unlearns by rewinding to an earlier training checkpoint and applying limited gradient steps on the retained data, with Gaussian perturbations to guarantee $(\epsilon, \delta)$-unlearning. The authors provide rigorous privacy-utility-complexity analyses, develop a proximal-point-based checkpoint reconstruction, and prove PL-inequality–based generalization guarantees. Empirically, R2D outperforms existing certified and non-certified unlearning methods under realistic non-i.i.d. unlearning scenarios and demonstrates substantial computational advantages. The work also includes preliminary LLM unlearning experiments suggesting broad applicability of the rewinding paradigm.

Abstract

Machine unlearning algorithms aim to efficiently remove data from a model without retraining it from scratch, in order to remove corrupted or outdated data or respect a user's ``right to be forgotten." Certified machine unlearning is a strong theoretical guarantee based on differential privacy that quantifies the extent to which an algorithm erases data from the model weights. In contrast to existing works in certified unlearning for convex or strongly convex loss functions, or nonconvex objectives with limiting assumptions, we propose the first, first-order, black-box (i.e., can be applied to models pretrained with vanilla gradient descent) algorithm for unlearning on general nonconvex loss functions, which unlearns by ``rewinding" to an earlier step during the learning process before performing gradient descent on the loss function of the retained data points. We prove $(ε, δ)$ certified unlearning and performance guarantees that establish the privacy-utility-complexity tradeoff of our algorithm, and we prove generalization guarantees for functions that satisfy the Polyak-Lojasiewicz inequality. Finally, we demonstrate the superior performance of our algorithm compared to existing methods, within a new experimental framework that more accurately reflects unlearning user data in practice.

Rewind-to-Delete: Certified Machine Unlearning for Nonconvex Functions

TL;DR

This paper introduces rewind-to-delete (R2D), a first-order, black-box certified unlearning algorithm for general nonconvex loss functions. It unlearns by rewinding to an earlier training checkpoint and applying limited gradient steps on the retained data, with Gaussian perturbations to guarantee -unlearning. The authors provide rigorous privacy-utility-complexity analyses, develop a proximal-point-based checkpoint reconstruction, and prove PL-inequality–based generalization guarantees. Empirically, R2D outperforms existing certified and non-certified unlearning methods under realistic non-i.i.d. unlearning scenarios and demonstrates substantial computational advantages. The work also includes preliminary LLM unlearning experiments suggesting broad applicability of the rewinding paradigm.

Abstract

Machine unlearning algorithms aim to efficiently remove data from a model without retraining it from scratch, in order to remove corrupted or outdated data or respect a user's ``right to be forgotten." Certified machine unlearning is a strong theoretical guarantee based on differential privacy that quantifies the extent to which an algorithm erases data from the model weights. In contrast to existing works in certified unlearning for convex or strongly convex loss functions, or nonconvex objectives with limiting assumptions, we propose the first, first-order, black-box (i.e., can be applied to models pretrained with vanilla gradient descent) algorithm for unlearning on general nonconvex loss functions, which unlearns by ``rewinding" to an earlier step during the learning process before performing gradient descent on the loss function of the retained data points. We prove certified unlearning and performance guarantees that establish the privacy-utility-complexity tradeoff of our algorithm, and we prove generalization guarantees for functions that satisfy the Polyak-Lojasiewicz inequality. Finally, we demonstrate the superior performance of our algorithm compared to existing methods, within a new experimental framework that more accurately reflects unlearning user data in practice.
Paper Structure (24 sections, 10 theorems, 79 equations, 7 figures, 16 tables, 3 algorithms)

This paper contains 24 sections, 10 theorems, 79 equations, 7 figures, 16 tables, 3 algorithms.

Key Result

Lemma 2.1

Suppose $f(\theta)$ is continuously differentiable, $\theta_t$ is defined as in (eq:implicit), and let $\eta < \frac{1}{L}$. Then $\theta_t = prox_{-f, \eta} (\theta_{t+1})$.

Figures (7)

  • Figure 1: Privacy-utility-complexity tradeoff of R2D compared to other certified unlearning methods (Constrained Newton Step and Hessian-Free method) on the eICU dataset. In Figures 1a, 1f, and 1g, we plot against the rewind percent, computed as $\frac{K}{T} \times 100 \%$.
  • Figure 2: Model performance vs. rewinding for $\sigma = 0.01$.
  • Figure 3: Membership inference attack success (AUC). We bold the two best results for each dataset and attack method.
  • Figure 3: Comparison of $K(\epsilon)$ and analytically derived bound using real-world parameters from the eICU dataset.
  • Figure 4: MIA scores for the eICU dataset (top) and the Lacuna-100 dataset (bottom).
  • ...and 2 more figures

Theorems & Definitions (14)

  • Lemma 2.1
  • Theorem 3.1
  • Corollary 3.2
  • Corollary 3.3
  • proof
  • Lemma A.1
  • Theorem A.2
  • Lemma A.3
  • proof
  • Lemma A.4
  • ...and 4 more