Understanding Fine-tuning in Approximate Unlearning: A Theoretical Perspective
Meng Ding, Rohan Sharma, Changyou Chen, Jinhui Xu, Kaiyi Ji
TL;DR
This work analyzes why naive fine-tuning fails to forget targeted data in an overparameterized linear regression setting, by dissecting the influence of forgetting versus remaining features. It provides a formal, theory-driven view of Original Training, Fine-Tuning, and Golden Unlearning, introducing Remaining Loss and Unlearning Loss as key metrics and showing that FT preserves forgetting-data influence through projection in both distinct and overlapping feature regimes. The authors propose Retention-Based Masking (RBM), which builds masks from the remaining data to remove forgetting components in the pretrained weights, yielding improved unlearning accuracy (UA) while preserving retaining accuracy (RA), especially when overlapping features exist. Empirical results on synthetic data and real datasets (e.g., CIFAR-10/100, TinyImageNet, SVHN) demonstrate that RBM outperforms forgetting-based masking approaches, achieving near-golden unlearning while maintaining strong retaining performance and lower disparity, thereby offering a principled path toward balanced approximate unlearning.
Abstract
Machine Unlearning has emerged as a significant area of research, focusing on `removing' specific subsets of data from a trained model. Fine-tuning (FT) methods have become one of the fundamental approaches for approximating unlearning, as they effectively retain model performance. However, it is consistently observed that naive FT methods struggle to forget the targeted data. In this paper, we present the first theoretical analysis of FT methods for machine unlearning within a linear regression framework, providing a deeper exploration of this phenomenon. Our analysis reveals that while FT models can achieve zero remaining loss, they fail to forget the forgetting data, as the pretrained model retains its influence and the fine-tuning process does not adequately mitigate it. To address this, we propose a novel Retention-Based Masking (RBM) strategy that constructs a weight saliency map based on the remaining dataset, unlike existing methods that focus on the forgetting dataset. Our theoretical analysis demonstrates that RBM not only significantly improves unlearning accuracy (UA) but also ensures higher retaining accuracy (RA) by preserving overlapping features shared between the forgetting and remaining datasets. Experiments on synthetic and real-world datasets validate our theoretical insights, showing that RBM outperforms existing masking approaches in balancing UA, RA, and disparity metrics.
