Understanding Fine-tuning in Approximate Unlearning: A Theoretical Perspective

Meng Ding; Rohan Sharma; Changyou Chen; Jinhui Xu; Kaiyi Ji

Understanding Fine-tuning in Approximate Unlearning: A Theoretical Perspective

Meng Ding, Rohan Sharma, Changyou Chen, Jinhui Xu, Kaiyi Ji

TL;DR

This work analyzes why naive fine-tuning fails to forget targeted data in an overparameterized linear regression setting, by dissecting the influence of forgetting versus remaining features. It provides a formal, theory-driven view of Original Training, Fine-Tuning, and Golden Unlearning, introducing Remaining Loss and Unlearning Loss as key metrics and showing that FT preserves forgetting-data influence through projection in both distinct and overlapping feature regimes. The authors propose Retention-Based Masking (RBM), which builds masks from the remaining data to remove forgetting components in the pretrained weights, yielding improved unlearning accuracy (UA) while preserving retaining accuracy (RA), especially when overlapping features exist. Empirical results on synthetic data and real datasets (e.g., CIFAR-10/100, TinyImageNet, SVHN) demonstrate that RBM outperforms forgetting-based masking approaches, achieving near-golden unlearning while maintaining strong retaining performance and lower disparity, thereby offering a principled path toward balanced approximate unlearning.

Abstract

Machine Unlearning has emerged as a significant area of research, focusing on `removing' specific subsets of data from a trained model. Fine-tuning (FT) methods have become one of the fundamental approaches for approximating unlearning, as they effectively retain model performance. However, it is consistently observed that naive FT methods struggle to forget the targeted data. In this paper, we present the first theoretical analysis of FT methods for machine unlearning within a linear regression framework, providing a deeper exploration of this phenomenon. Our analysis reveals that while FT models can achieve zero remaining loss, they fail to forget the forgetting data, as the pretrained model retains its influence and the fine-tuning process does not adequately mitigate it. To address this, we propose a novel Retention-Based Masking (RBM) strategy that constructs a weight saliency map based on the remaining dataset, unlike existing methods that focus on the forgetting dataset. Our theoretical analysis demonstrates that RBM not only significantly improves unlearning accuracy (UA) but also ensures higher retaining accuracy (RA) by preserving overlapping features shared between the forgetting and remaining datasets. Experiments on synthetic and real-world datasets validate our theoretical insights, showing that RBM outperforms existing masking approaches in balancing UA, RA, and disparity metrics.

Understanding Fine-tuning in Approximate Unlearning: A Theoretical Perspective

TL;DR

Abstract

Paper Structure (20 sections, 6 theorems, 35 equations, 3 figures, 4 tables)

This paper contains 20 sections, 6 theorems, 35 equations, 3 figures, 4 tables.

Introduction
Related Work
Machine Unlearning in Linear Models
Naive Fine-Tuning Methods Fail To Unlearn
Distinct Features
Overlapping Features
Eliminating Forgetting Data Features from Pre-Trained Model Enhances Unlearning
Rethinking Masking in Approximate Unlearning
Discarding Overlapping Features May Harm Retaining Accuracy
Experiment
Conclusion
Discussion
Experimental Details
Verification via Simulation
Additional Real-world Details
...and 5 more sections

Key Result

Theorem 3.2

Suppose a model is trained by the procedure eq:unlearn_fine-tuning and eq:train_from_scratch separately. Under the asm:ortho_w, it holds that Here, $\mathbf{w}_t$ refers to the unlearned model via fine-tuning, $\mathbf{w}_g$ refers to the model parameter retrained from scratch, RL and UL refer to the remaining loss on the remaining data and the unlearning loss on the forgetting data.

Figures (3)

Figure 1: Machine Unlearning Performance via (Masked) Fine-tuning with (without) Overlapping Features. \ref{['fig:noregu_noverlap']} and \ref{['fig:noregu_overlap']} present the relationship between machine unlearning loss (i.e. RA, UA) and the number of fine-tuning data samples under distinct features and overlapping features assumptions, using naive FT method. In contrast, \ref{['fig:regu_noverlap']} and \ref{['fig:regu_overlap']} show the same relationship using masked fine-tuning methods, as discussed in \ref{['sec:masked']}.
Figure 2: Comparison of Machine Unlearning Loss with and without Overlapping Features. \ref{['fig:regu_do1']} retains overlapping features from the pretrained model, showing the matching performance between masked $\mathbf{w}_t$ model and golden model $\mathbf{w}_g$; \ref{['fig:regu_do2']} discards the overlapping features, showing a decline in retaining accuracy.
Figure 3: Visualization of Remaining Data and Forgetting Data Features Across Various Dataset. Figures \ref{['fig:c10_3']}-\ref{['fig:c10_9']} focus on classes 3, 6, and 9 in CIFAR-10 and Figures \ref{['fig:c100_30']}-\ref{['fig:c100_90']} focus on classes 30, 60, and 90 in CIFAR-100.

Theorems & Definitions (10)

Theorem 3.2
Theorem 3.4
Theorem 4.1
proof
Corollary C.1: Projection Matrix properties
proof : Proof of \ref{['pro:proj_matrix']}
Corollary C.2: Minimum Norm Solution 1
proof : Proof of \ref{['pro:min_norm_solu']}
Corollary C.3: Projection Matrix properties$^{\prime}$
proof : Proof of \ref{['pro:proj_matrix2']}

Understanding Fine-tuning in Approximate Unlearning: A Theoretical Perspective

TL;DR

Abstract

Understanding Fine-tuning in Approximate Unlearning: A Theoretical Perspective

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (10)