Rewind-to-Delete: Certified Machine Unlearning for Nonconvex Functions
Siqiao Mu, Diego Klabjan
TL;DR
This paper introduces rewind-to-delete (R2D), a first-order, black-box certified unlearning algorithm for general nonconvex loss functions. It unlearns by rewinding to an earlier training checkpoint and applying limited gradient steps on the retained data, with Gaussian perturbations to guarantee $(\epsilon, \delta)$-unlearning. The authors provide rigorous privacy-utility-complexity analyses, develop a proximal-point-based checkpoint reconstruction, and prove PL-inequality–based generalization guarantees. Empirically, R2D outperforms existing certified and non-certified unlearning methods under realistic non-i.i.d. unlearning scenarios and demonstrates substantial computational advantages. The work also includes preliminary LLM unlearning experiments suggesting broad applicability of the rewinding paradigm.
Abstract
Machine unlearning algorithms aim to efficiently remove data from a model without retraining it from scratch, in order to remove corrupted or outdated data or respect a user's ``right to be forgotten." Certified machine unlearning is a strong theoretical guarantee based on differential privacy that quantifies the extent to which an algorithm erases data from the model weights. In contrast to existing works in certified unlearning for convex or strongly convex loss functions, or nonconvex objectives with limiting assumptions, we propose the first, first-order, black-box (i.e., can be applied to models pretrained with vanilla gradient descent) algorithm for unlearning on general nonconvex loss functions, which unlearns by ``rewinding" to an earlier step during the learning process before performing gradient descent on the loss function of the retained data points. We prove $(ε, δ)$ certified unlearning and performance guarantees that establish the privacy-utility-complexity tradeoff of our algorithm, and we prove generalization guarantees for functions that satisfy the Polyak-Lojasiewicz inequality. Finally, we demonstrate the superior performance of our algorithm compared to existing methods, within a new experimental framework that more accurately reflects unlearning user data in practice.
