Table of Contents
Fetching ...

Distributional Machine Unlearning via Selective Data Removal

Youssef Allouah, Rachid Guerraoui, Sanmi Koyejo

TL;DR

This work tackles domain-level unlearning in ML by introducing distributional unlearning, a framework that uses KL-divergence constraints to simultaneously maximize distance from an unwanted distribution and preserve a retained distribution. It establishes a formal $(oldsymbol{ extalpha}, oldsymbol{ extepsilon})$-Pareto frontier for Gaussian and exponential-family models, and proves downstream log-loss guarantees for edited data. The authors propose a distance-based selective removal algorithm, showing quadratic gains in sample efficiency over random deletion in low-divergence regimes and validating the approach across synthetic data and real-world tasks like CIFAR-10 and Jigsaw Toxic Comments, with synergy observations for existing sample-level unlearning methods. The results suggest that strong unlearning effects can be achieved with substantially smaller forget sets, enabling scalable and principled subpopulation unlearning with practical downstream robustness.

Abstract

Machine learning systems increasingly face requirements to remove entire domains of information -- such as toxic language or biases -- rather than individual user data. This task presents a dilemma: full removal of the unwanted domain data is computationally expensive, while random partial removal is statistically inefficient. We find that a domain's statistical influence is often concentrated in a small subset of its data samples, suggesting a path between ineffective partial removal and unnecessary complete removal. We formalize this as distributional unlearning: a framework to select a small subset that balances forgetting an unwanted distribution while preserving a desired one. Using Kullback-Leibler divergence constraints, we derive the exact removal-preservation Pareto frontier for exponential families and prove that models trained on the edited data achieve corresponding log-loss bounds. We propose a distance-based selection algorithm and show it is quadratically more sample-efficient than random removal in the challenging low-divergence regime. Experiments across synthetic, text, and image datasets (Jigsaw, CIFAR-10, SMS spam) show our method requires 15-82% less deletion than full removal for strong unlearning effects, e.g., halving initial forget set accuracy. Ultimately, by showing a small forget set often suffices, our framework lays the foundations for more scalable and rigorous subpopulation unlearning.

Distributional Machine Unlearning via Selective Data Removal

TL;DR

This work tackles domain-level unlearning in ML by introducing distributional unlearning, a framework that uses KL-divergence constraints to simultaneously maximize distance from an unwanted distribution and preserve a retained distribution. It establishes a formal -Pareto frontier for Gaussian and exponential-family models, and proves downstream log-loss guarantees for edited data. The authors propose a distance-based selective removal algorithm, showing quadratic gains in sample efficiency over random deletion in low-divergence regimes and validating the approach across synthetic data and real-world tasks like CIFAR-10 and Jigsaw Toxic Comments, with synergy observations for existing sample-level unlearning methods. The results suggest that strong unlearning effects can be achieved with substantially smaller forget sets, enabling scalable and principled subpopulation unlearning with practical downstream robustness.

Abstract

Machine learning systems increasingly face requirements to remove entire domains of information -- such as toxic language or biases -- rather than individual user data. This task presents a dilemma: full removal of the unwanted domain data is computationally expensive, while random partial removal is statistically inefficient. We find that a domain's statistical influence is often concentrated in a small subset of its data samples, suggesting a path between ineffective partial removal and unnecessary complete removal. We formalize this as distributional unlearning: a framework to select a small subset that balances forgetting an unwanted distribution while preserving a desired one. Using Kullback-Leibler divergence constraints, we derive the exact removal-preservation Pareto frontier for exponential families and prove that models trained on the edited data achieve corresponding log-loss bounds. We propose a distance-based selection algorithm and show it is quadratically more sample-efficient than random removal in the challenging low-divergence regime. Experiments across synthetic, text, and image datasets (Jigsaw, CIFAR-10, SMS spam) show our method requires 15-82% less deletion than full removal for strong unlearning effects, e.g., halving initial forget set accuracy. Ultimately, by showing a small forget set often suffices, our framework lays the foundations for more scalable and rigorous subpopulation unlearning.

Paper Structure

This paper contains 39 sections, 15 theorems, 122 equations, 6 figures, 2 tables.

Key Result

Proposition 1

Let $p_1, p_2$ be two distributions in $\mathcal{P}$, the class of Gaussian distributions with shared positive covariance. The Pareto frontier of $(\alpha, \varepsilon)$ values achievable in $\mathcal{P}$ is:

Figures (6)

  • Figure 1: Synthetic Gaussians. The empirical frontier aligns with the theoretical prediction.
  • Figure 2: Synthetic Gaussians. Selective removal consistently requires fewer deletions, especially when $\mathrm{KL}(p_1 \| p_2)$ is small (left), for the same removal and preservation target as random removal. In high-divergence regimes (right), the gap between methods shrinks, as predicted by the theory.
  • Figure 3: CIFAR‑10 images. Removing cat images suppresses accuracy on that class (left) while leaving accuracy on the retained nine classes essentially unchanged (right, ${<}0.03$ variation). No substantial removal is observed until $50\%$ deletion, before selective removal strategies lr-maha and maha-mu2 outperform random removal. Error bars: $\pm 1$ standard error over thirty seeds.
  • Figure 4: Jigsaw Toxic Comments. Impact of removing profane comments on Jigsaw Toxic. Left: recall on the to‑be‑forgotten set $p_1$; right: F$_1$ on the retained set $p_2$. Utility is almost unchanged up to $60\%$ deletion; marked forgetting appears only around $80\%$ deletion, with lr‑cos showing the steepest drop. Error bars: $\pm 1$ standard error over five randomness seeds.
  • Figure 5: SMS Spam. The likelihood-ratio score (lr-cos) pushes spam recall below $0.6$ after deleting $70\%$ of spam, whereas random deletion needs nearly $90\%$ removal to reach the same point. Ham performance remains almost flat (${<}0.004$ absolute change) until the final 100 % budget, affirming the tight preservation guarantee. Error bars: $\pm 1$ standard error over ten seeds.
  • ...and 1 more figures

Theorems & Definitions (28)

  • Definition 1: $(\alpha,\varepsilon)$-Distributional Unlearning
  • Proposition 1: Pareto Frontier
  • Proposition 2
  • Proposition 3: Random Removal
  • Theorem 0: Selective Removal
  • Proposition 3: Pareto Frontier
  • proof
  • Theorem 1: Pareto Frontier--Exponential Families
  • proof
  • Proposition 3
  • ...and 18 more