How Does Overparameterization Affect Machine Unlearning of Deep Neural Networks?
Gal Alon, Yehuda Dar
TL;DR
The paper investigates how the parameterization level of deep neural networks, captured by width, affects machine unlearning for removing specific training data. It develops a validation-based hyperparameter-tuning framework for SCRUB, NegGrad, and L1 sparsity unlearning methods and evaluates two unlearning goals—privacy of forgotten data and bias removal—across under- and overparameterized DNNs on multiple datasets. The key findings show that overparameterized models generally achieve a better balance between maintaining generalization and fulfilling the unlearning goals, with bias removal requiring explicit use of the forget set, and privacy improvements explained by localized changes in decision regions around forget samples. These results provide guidance for selecting architectures and unlearning methods in practice and motivate future parameterization-aware unlearning approaches.
Abstract
Machine unlearning is the task of updating a trained model to forget specific training data without retraining from scratch. In this paper, we investigate how unlearning of deep neural networks (DNNs) is affected by the model parameterization level, which corresponds here to the DNN width. We define validation-based tuning for several unlearning methods from the recent literature, and show how these methods perform differently depending on (i) the DNN parameterization level, (ii) the unlearning goal (unlearned data privacy or bias removal), (iii) whether the unlearning method explicitly uses the unlearned examples. Our results show that unlearning excels on overparameterized models, in terms of balancing between generalization and achieving the unlearning goal; although for bias removal this requires the unlearning method to use the unlearned examples. We further elucidate our error-based analysis by measuring how much the unlearning changes the classification decision regions in the proximity of the unlearned examples, and avoids changing them elsewhere. By this we show that the unlearning success for overparameterized models stems from the ability to delicately change the model functionality in small regions in the input space while keeping much of the model functionality unchanged.
