Table of Contents
Fetching ...

The Utility and Complexity of in- and out-of-Distribution Machine Unlearning

Youssef Allouah, Joshua Kazdan, Rachid Guerraoui, Sanmi Koyejo

TL;DR

This work formalizes approximate machine unlearning via $(q,\varepsilon)$-Rényi guarantees and separates in-distribution from out-of-distribution forget data. It proves that a simple noisy ERM procedure with output perturbation achieves dimension-free deletion capacity for in-distribution forget data, yielding near-linear time/space and a sharp separation from differential privacy. For out-of-distribution forget data, it introduces a robust gradient method based on coordinate-wise trimmed means that amortizes unlearning time with interpolation-error-driven bounds, achieving near-linear time and constant-factor deletion capacity under strong conditions. Together, these results provide theoretical certainties for practical unlearning under privacy and robustness constraints and identify key directions for future unified upper bounds and extensions to richer models.

Abstract

Machine unlearning, the process of selectively removing data from trained models, is increasingly crucial for addressing privacy concerns and knowledge gaps post-deployment. Despite this importance, existing approaches are often heuristic and lack formal guarantees. In this paper, we analyze the fundamental utility, time, and space complexity trade-offs of approximate unlearning, providing rigorous certification analogous to differential privacy. For in-distribution forget data -- data similar to the retain set -- we show that a surprisingly simple and general procedure, empirical risk minimization with output perturbation, achieves tight unlearning-utility-complexity trade-offs, addressing a previous theoretical gap on the separation from unlearning "for free" via differential privacy, which inherently facilitates the removal of such data. However, such techniques fail with out-of-distribution forget data -- data significantly different from the retain set -- where unlearning time complexity can exceed that of retraining, even for a single sample. To address this, we propose a new robust and noisy gradient descent variant that provably amortizes unlearning time complexity without compromising utility.

The Utility and Complexity of in- and out-of-Distribution Machine Unlearning

TL;DR

This work formalizes approximate machine unlearning via -Rényi guarantees and separates in-distribution from out-of-distribution forget data. It proves that a simple noisy ERM procedure with output perturbation achieves dimension-free deletion capacity for in-distribution forget data, yielding near-linear time/space and a sharp separation from differential privacy. For out-of-distribution forget data, it introduces a robust gradient method based on coordinate-wise trimmed means that amortizes unlearning time with interpolation-error-driven bounds, achieving near-linear time and constant-factor deletion capacity under strong conditions. Together, these results provide theoretical certainties for practical unlearning under privacy and robustness constraints and identify key directions for future unified upper bounds and extensions to richer models.

Abstract

Machine unlearning, the process of selectively removing data from trained models, is increasingly crucial for addressing privacy concerns and knowledge gaps post-deployment. Despite this importance, existing approaches are often heuristic and lack formal guarantees. In this paper, we analyze the fundamental utility, time, and space complexity trade-offs of approximate unlearning, providing rigorous certification analogous to differential privacy. For in-distribution forget data -- data similar to the retain set -- we show that a surprisingly simple and general procedure, empirical risk minimization with output perturbation, achieves tight unlearning-utility-complexity trade-offs, addressing a previous theoretical gap on the separation from unlearning "for free" via differential privacy, which inherently facilitates the removal of such data. However, such techniques fail with out-of-distribution forget data -- data significantly different from the retain set -- where unlearning time complexity can exceed that of retraining, even for a single sample. To address this, we propose a new robust and noisy gradient descent variant that provably amortizes unlearning time complexity without compromising utility.

Paper Structure

This paper contains 23 sections, 14 theorems, 91 equations, 2 figures, 2 tables, 2 algorithms.

Key Result

Proposition 1

Assume that, for every $\mathbf{z} \in \mathcal{Z}$, the loss $\ell(\cdot~;\mathbf{z})$ is $\mu$-strongly convex and $L$-smooth. Consider any unlearning-training pair $(\mathcal{U}, \mathcal{A})$, with output $\hat{\bm{\theta}} \coloneqq \mathcal{U}(\mathcal{S}_f, \mathcal{A}(\mathcal{S}))$, and rec Moreover, if $\nabla \ell(\bm{\theta}^\star; \mathbf{z}), \mathbf{z} \sim \mathcal{D},$ is sub-Gaus

Figures (2)

  • Figure 1: Numerical validation on a linear regression task with synthetic data for the same unlearning budget, with in-distribution (left) and out-of-distribution (right) data. The in-distribution forget set is sampled at random, while the out-of-distribution data is obtained by shifting labels with a fixed offset. Additional details and results on real data can be found in Appendix \ref{['app:details']}.
  • Figure 2: Out-of-distribution error $\mathcal{L}_{\mathrm{OOD}}$ versus number of unlearning iterations for Algorithms \ref{['alg:general']} and \ref{['alg:tgd']}, using gradient descent as optimizer, with $f \in \{1, 0.1 n, 0.45 n\}$ forget data out of $16,512$ samples of the California Housing dataset. The per-iteration cost is the same for both algorithms. The unlearning time of Algorithm \ref{['alg:general']} (non-robust) can be $10\times$ slower than Algorithm \ref{['alg:tgd']}.

Theorems & Definitions (29)

  • Definition 1: $(q, \varepsilon)$-approximate unlearning
  • Definition 2
  • Definition 3
  • Proposition 1
  • Theorem 0
  • Corollary 0
  • Proposition 2
  • Theorem 0
  • Definition 4: $L$-smoothness
  • Definition 5: $\mu$-strong convexity
  • ...and 19 more