Machine Unlearning under Overparameterization
Jacob L. Block, Aryan Mokhtari, Sanjay Shakkottai
TL;DR
This work tackles unlearning in overparameterized models where many interpolating solutions exist and gradient-based unlearning fails due to vanishing gradients. It introduces a bilevel objective that selects the simplest interpolator on the retain set, and a practical framework (MinNorm-OG) that uses a first-order, gradient-based relaxation requiring only gradients on the retain data at the original solution. The authors provide theoretical guarantees for linear models, linear networks, and two-layer perceptrons, linking the surrogate relaxation to exact unlearning under suitable regularizers. Empirically, MinNorm-OG outperforms retraining and several gradient-based baselines across multiple unlearning tasks, with favorable runtime characteristics. This advances unlearning theory and practice in highly overparameterized regimes with scalable, first-order methods.
Abstract
Machine unlearning algorithms aim to remove the influence of specific training samples, ideally recovering the model that would have resulted from training on the remaining data alone. We study unlearning in the overparameterized setting, where many models interpolate the data, and defining the solution as any loss minimizer over the retained set$\unicode{x2013}$as in prior work in the underparameterized setting$\unicode{x2013}$is inadequate, since the original model may already interpolate the retained data and satisfy this condition. In this regime, loss gradients vanish, rendering prior methods based on gradient perturbations ineffective, motivating both new unlearning definitions and algorithms. For this setting, we define the unlearning solution as the minimum-complexity interpolator over the retained data and propose a new algorithmic framework that only requires access to model gradients on the retained set at the original solution. We minimize a regularized objective over perturbations constrained to be orthogonal to these model gradients, a first-order relaxation of the interpolation condition. For different model classes, we provide exact and approximate unlearning guarantees and demonstrate that an implementation of our framework outperforms existing baselines across various unlearning experiments.
