Table of Contents
Fetching ...

Langevin Unlearning: A New Perspective of Noisy Gradient Descent for Machine Unlearning

Eli Chien, Haoyu Wang, Ziang Chen, Pan Li

TL;DR

The paper introduces Langevin unlearning, a projected noisy gradient descent framework that unifies differential privacy guarantees with approximate unlearning. It proves existence of a stationary distribution for learning and derives RU guarantees showing exponential privacy improvement after unlearning iterations, including non-convex, convex, and strongly convex settings, with extensions to sequential and batch requests. Empirically, it demonstrates favorable privacy-utility-complexity trade-offs on logistic regression tasks (MNIST, CIFAR-10) compared to D2D and retraining baselines. The work provides a principled, scalable approach to data removal requests with potential for broader adoption and future enhancements in unlearning under privacy constraints.

Abstract

Machine unlearning has raised significant interest with the adoption of laws ensuring the ``right to be forgotten''. Researchers have provided a probabilistic notion of approximate unlearning under a similar definition of Differential Privacy (DP), where privacy is defined as statistical indistinguishability to retraining from scratch. We propose Langevin unlearning, an unlearning framework based on noisy gradient descent with privacy guarantees for approximate unlearning problems. Langevin unlearning unifies the DP learning process and the privacy-certified unlearning process with many algorithmic benefits. These include approximate certified unlearning for non-convex problems, complexity saving compared to retraining, sequential and batch unlearning for multiple unlearning requests.

Langevin Unlearning: A New Perspective of Noisy Gradient Descent for Machine Unlearning

TL;DR

The paper introduces Langevin unlearning, a projected noisy gradient descent framework that unifies differential privacy guarantees with approximate unlearning. It proves existence of a stationary distribution for learning and derives RU guarantees showing exponential privacy improvement after unlearning iterations, including non-convex, convex, and strongly convex settings, with extensions to sequential and batch requests. Empirically, it demonstrates favorable privacy-utility-complexity trade-offs on logistic regression tasks (MNIST, CIFAR-10) compared to D2D and retraining baselines. The work provides a principled, scalable approach to data removal requests with potential for broader adoption and future enhancements in unlearning under privacy constraints.

Abstract

Machine unlearning has raised significant interest with the adoption of laws ensuring the ``right to be forgotten''. Researchers have provided a probabilistic notion of approximate unlearning under a similar definition of Differential Privacy (DP), where privacy is defined as statistical indistinguishability to retraining from scratch. We propose Langevin unlearning, an unlearning framework based on noisy gradient descent with privacy guarantees for approximate unlearning problems. Langevin unlearning unifies the DP learning process and the privacy-certified unlearning process with many algorithmic benefits. These include approximate certified unlearning for non-convex problems, complexity saving compared to retraining, sequential and batch unlearning for multiple unlearning requests.
Paper Structure (35 sections, 22 theorems, 79 equations, 5 figures, 3 tables, 8 algorithms)

This paper contains 35 sections, 22 theorems, 79 equations, 5 figures, 3 tables, 8 algorithms.

Key Result

Theorem 3.1

Suppose that the closed convex set $\mathcal{C}_R\subset \mathbb{R}^d$ is bounded with $\mathcal{C}_R$ having a positive Lebesgue measure and that $\nabla f_{\mathcal{D}}:\mathcal{C}_R\to\mathbb{R}^d$ is continuous. The Markov chain $\{x_t\}$ in eq:GLD_learning admits a unique invariant probability

Figures (5)

  • Figure 1: The geometric interpretation of relations between learning and unlearning. (Left) RDP guarantee of the learning process induces a regular polyhedron. Smaller $\varepsilon_0$ implies an "easier" unlearning problem. (Right) Learning and unlearning processes on adjacent datasets. It illustrates our main idea and results. More learning iteration gives worse privacy (privacy erosionchourasia2021differential) while more unlearning iteration gives better privacy, which we termed this phenomenon as privacy recuperation.
  • Figure 2: Illustration of (a) sequential unlearning and (b) batch unlearning. For sequential unlearning, we can leverage the weak triangle inequality of Rényi divergence to connect all the error terms. For batch unlearning, only the initial RDP guarantee changes with a general group size. Notably, unlearning more samples at once implies $\varepsilon_0$ being larger (Theorem \ref{['thm:NGD_learning_no_sc']}), and thus we need more unlearning iteration to recuperate the privacy loss to a desired $\varepsilon$.
  • Figure 3: Main experiments, where the top and bottom rows are for MNIST and CIFAR10 respectively. (a) Compare to D2D for unlearning one point using limited unlearning iteration. This demonstrates the privacy-utility ($\epsilon$-accuracy) tradeoff under the fixed unlearning complexity (K). For Langevin unlearning, we use only $K=1$ unlearning iterations. For D2D, we allow it not only to use $K=1,2,5$ unlearning iterations but also to keep the non-private internal state information. (b) Compare to D2D for unlearning $100$ points, where all methods achieve $(\epsilon,1/n)$-unlearning guarantee with $\epsilon=1$. For Langevin unlearning, we vary different unlearning batch sizes $S$ and combine them with the sequential unlearning result. For D2D, we do not allow it to keep the non-private internal state information in this experiment so that there is an inherent lower bound on the unlearning iterations per unlearning request. (c) A detailed investigation of the utility-complexity trade-off of Langevin unlearning with unlearning $S=100$ points at once under the fixed privacy constraint $\epsilon=1$. For each $\sigma$, we report the corresponding $\epsilon_0$ (black dash line) for the initial $(\epsilon_0,1/n)$-DP guarantee and the utility after unlearning to $\epsilon=1$.
  • Figure 4: Trade-off between privacy ($\epsilon$), unlearning complexity ($K$), and the number of points to be unlearned ($S$) in the batch unlearning setting for MNIST. We fix $\sigma=0.03$ so that $K$ can be determined given $(\epsilon,S)$.
  • Figure 5: (a) The utility results that correspond to Figure \ref{['fig:exp_fig23']}. Since $\sigma$ is fixed the utility is roughly the same. (b) The privacy-utility tradeoff for unlearning one point restricting to one (or $K$) unlearning update on the Adult dataset.

Theorems & Definitions (35)

  • Definition 2.1: Log-Sobolev Inequality ($C_{\text{LSI}}$-LSI)
  • Definition 2.2: Rényi difference
  • Definition 2.3: Rényi Differential Privacy (RDP) mironov2017renyi
  • Definition 2.4: Rényi Unlearning (RU)
  • Theorem 3.1
  • Theorem 3.2: RU guarantee of PNGD unlearning
  • Theorem 3.3: RDP guarantee of PNGD learning
  • Corollary 3.4: Sequential unlearning
  • Proposition E.1
  • Theorem
  • ...and 25 more