Table of Contents
Fetching ...

Machine Unlearning under Overparameterization

Jacob L. Block, Aryan Mokhtari, Sanjay Shakkottai

TL;DR

This work tackles unlearning in overparameterized models where many interpolating solutions exist and gradient-based unlearning fails due to vanishing gradients. It introduces a bilevel objective that selects the simplest interpolator on the retain set, and a practical framework (MinNorm-OG) that uses a first-order, gradient-based relaxation requiring only gradients on the retain data at the original solution. The authors provide theoretical guarantees for linear models, linear networks, and two-layer perceptrons, linking the surrogate relaxation to exact unlearning under suitable regularizers. Empirically, MinNorm-OG outperforms retraining and several gradient-based baselines across multiple unlearning tasks, with favorable runtime characteristics. This advances unlearning theory and practice in highly overparameterized regimes with scalable, first-order methods.

Abstract

Machine unlearning algorithms aim to remove the influence of specific training samples, ideally recovering the model that would have resulted from training on the remaining data alone. We study unlearning in the overparameterized setting, where many models interpolate the data, and defining the solution as any loss minimizer over the retained set$\unicode{x2013}$as in prior work in the underparameterized setting$\unicode{x2013}$is inadequate, since the original model may already interpolate the retained data and satisfy this condition. In this regime, loss gradients vanish, rendering prior methods based on gradient perturbations ineffective, motivating both new unlearning definitions and algorithms. For this setting, we define the unlearning solution as the minimum-complexity interpolator over the retained data and propose a new algorithmic framework that only requires access to model gradients on the retained set at the original solution. We minimize a regularized objective over perturbations constrained to be orthogonal to these model gradients, a first-order relaxation of the interpolation condition. For different model classes, we provide exact and approximate unlearning guarantees and demonstrate that an implementation of our framework outperforms existing baselines across various unlearning experiments.

Machine Unlearning under Overparameterization

TL;DR

This work tackles unlearning in overparameterized models where many interpolating solutions exist and gradient-based unlearning fails due to vanishing gradients. It introduces a bilevel objective that selects the simplest interpolator on the retain set, and a practical framework (MinNorm-OG) that uses a first-order, gradient-based relaxation requiring only gradients on the retain data at the original solution. The authors provide theoretical guarantees for linear models, linear networks, and two-layer perceptrons, linking the surrogate relaxation to exact unlearning under suitable regularizers. Empirically, MinNorm-OG outperforms retraining and several gradient-based baselines across multiple unlearning tasks, with favorable runtime characteristics. This advances unlearning theory and practice in highly overparameterized regimes with scalable, first-order methods.

Abstract

Machine unlearning algorithms aim to remove the influence of specific training samples, ideally recovering the model that would have resulted from training on the remaining data alone. We study unlearning in the overparameterized setting, where many models interpolate the data, and defining the solution as any loss minimizer over the retained setas in prior work in the underparameterized settingis inadequate, since the original model may already interpolate the retained data and satisfy this condition. In this regime, loss gradients vanish, rendering prior methods based on gradient perturbations ineffective, motivating both new unlearning definitions and algorithms. For this setting, we define the unlearning solution as the minimum-complexity interpolator over the retained data and propose a new algorithmic framework that only requires access to model gradients on the retained set at the original solution. We minimize a regularized objective over perturbations constrained to be orthogonal to these model gradients, a first-order relaxation of the interpolation condition. For different model classes, we provide exact and approximate unlearning guarantees and demonstrate that an implementation of our framework outperforms existing baselines across various unlearning experiments.

Paper Structure

This paper contains 43 sections, 14 theorems, 71 equations, 14 figures, 22 tables, 1 algorithm.

Key Result

Theorem 2.1

Let $f(\bm{\theta}\xspace^*\xspace,\cdot)$ interpolate $\mathcal{D}$, so $f(\bm{\theta}\xspace^*\xspace,\bm{x}\xspace) = \bm{y}\xspace$ for all $(\bm{x}\xspace,\bm{y}\xspace) \in \mathcal{D}$, and let $M_{\text{LG}}$ be any loss-gradient unlearning method. If the sample loss $\mathcal{L} \left( \bm{

Figures (14)

  • Figure 1: Example unlearned model fits when given 100 unlearning epochs for the Data Poisoning experiment, where the forget points distort the retain set trend $y = \sin(x)$.
  • Figure 2: Pareto frontiers for each method across hyperparameter settings in the Multi-Class Label Erasure experiment on colored versions of CIFAR-10 (left) and Tiny ImageNet (right). Models predict color and content, but the retain set contains only gray images. The ground truth unlearned model (GT) performs well on gray inputs but always predicts gray with probability 1. The x-axis shows accuracy on gray test images (higher is better), and the y-axis shows mean squared error between predicted probability of gray on all inputs and the target of 1 (lower is better). MinNorm-OG (ours) best approaches the ground-truth unlearned model’s performance relative to the other baselines.
  • Figure 3: Example unlearned model fits when given 10 unlearning epochs for the Data Poisoning experiment, where the forget points distort the retain set trend $y = \sin(x)$.
  • Figure 4: Example unlearned model fits when given 100 unlearning epochs for the Data Poisoning experiment, where the forget points distort the retain set trend $y = \sin(x)$.
  • Figure 5: Example unlearned model fits when given 1000 unlearning epochs for the Data Poisoning experiment, where the forget points distort the retain set trend $y = \sin(x)$.
  • ...and 9 more figures

Theorems & Definitions (19)

  • Definition 2.1
  • Theorem 2.1
  • Theorem 4.1
  • Lemma 1
  • Theorem 4.2
  • Theorem 4.3
  • Lemma 2
  • Theorem 4.4
  • Proposition 1
  • Corollary 1
  • ...and 9 more