Machine Unlearning under Overparameterization

Jacob L. Block; Aryan Mokhtari; Sanjay Shakkottai

Machine Unlearning under Overparameterization

Jacob L. Block, Aryan Mokhtari, Sanjay Shakkottai

TL;DR

This work tackles unlearning in overparameterized models where many interpolating solutions exist and gradient-based unlearning fails due to vanishing gradients. It introduces a bilevel objective that selects the simplest interpolator on the retain set, and a practical framework (MinNorm-OG) that uses a first-order, gradient-based relaxation requiring only gradients on the retain data at the original solution. The authors provide theoretical guarantees for linear models, linear networks, and two-layer perceptrons, linking the surrogate relaxation to exact unlearning under suitable regularizers. Empirically, MinNorm-OG outperforms retraining and several gradient-based baselines across multiple unlearning tasks, with favorable runtime characteristics. This advances unlearning theory and practice in highly overparameterized regimes with scalable, first-order methods.

Abstract

Machine unlearning algorithms aim to remove the influence of specific training samples, ideally recovering the model that would have resulted from training on the remaining data alone. We study unlearning in the overparameterized setting, where many models interpolate the data, and defining the solution as any loss minimizer over the retained set$\unicode{x2013}$as in prior work in the underparameterized setting$\unicode{x2013}$is inadequate, since the original model may already interpolate the retained data and satisfy this condition. In this regime, loss gradients vanish, rendering prior methods based on gradient perturbations ineffective, motivating both new unlearning definitions and algorithms. For this setting, we define the unlearning solution as the minimum-complexity interpolator over the retained data and propose a new algorithmic framework that only requires access to model gradients on the retained set at the original solution. We minimize a regularized objective over perturbations constrained to be orthogonal to these model gradients, a first-order relaxation of the interpolation condition. For different model classes, we provide exact and approximate unlearning guarantees and demonstrate that an implementation of our framework outperforms existing baselines across various unlearning experiments.

Machine Unlearning under Overparameterization

TL;DR

Abstract

as in prior work in the underparameterized setting

is inadequate, since the original model may already interpolate the retained data and satisfy this condition. In this regime, loss gradients vanish, rendering prior methods based on gradient perturbations ineffective, motivating both new unlearning definitions and algorithms. For this setting, we define the unlearning solution as the minimum-complexity interpolator over the retained data and propose a new algorithmic framework that only requires access to model gradients on the retained set at the original solution. We minimize a regularized objective over perturbations constrained to be orthogonal to these model gradients, a first-order relaxation of the interpolation condition. For different model classes, we provide exact and approximate unlearning guarantees and demonstrate that an implementation of our framework outperforms existing baselines across various unlearning experiments.

Machine Unlearning under Overparameterization

TL;DR

Abstract

Machine Unlearning under Overparameterization

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (14)

Theorems & Definitions (19)