Table of Contents
Fetching ...

Machine Unlearning under Retain-Forget Entanglement

Jingpu Cheng, Ping Liu, Qianxiao Li, Chi Zhang

Abstract

Forgetting a subset in machine unlearning is rarely an isolated task. Often, retained samples that are closely related to the forget set can be unintentionally affected, particularly when they share correlated features from pretraining or exhibit strong semantic similarities. To address this challenge, we propose a novel two-phase optimization framework specifically designed to handle such retai-forget entanglements. In the first phase, an augmented Lagrangian method increases the loss on the forget set while preserving accuracy on less-related retained samples. The second phase applies a gradient projection step, regularized by the Wasserstein-2 distance, to mitigate performance degradation on semantically related retained samples without compromising the unlearning objective. We validate our approach through comprehensive experiments on multiple unlearning tasks, standard benchmark datasets, and diverse neural architectures, demonstrating that it achieves effective and reliable unlearning while outperforming existing baselines in both accuracy retention and removal fidelity.

Machine Unlearning under Retain-Forget Entanglement

Abstract

Forgetting a subset in machine unlearning is rarely an isolated task. Often, retained samples that are closely related to the forget set can be unintentionally affected, particularly when they share correlated features from pretraining or exhibit strong semantic similarities. To address this challenge, we propose a novel two-phase optimization framework specifically designed to handle such retai-forget entanglements. In the first phase, an augmented Lagrangian method increases the loss on the forget set while preserving accuracy on less-related retained samples. The second phase applies a gradient projection step, regularized by the Wasserstein-2 distance, to mitigate performance degradation on semantically related retained samples without compromising the unlearning objective. We validate our approach through comprehensive experiments on multiple unlearning tasks, standard benchmark datasets, and diverse neural architectures, demonstrating that it achieves effective and reliable unlearning while outperforming existing baselines in both accuracy retention and removal fidelity.

Paper Structure

This paper contains 42 sections, 2 theorems, 25 equations, 3 figures, 15 tables, 1 algorithm.

Key Result

Proposition 4.1

Assume $\tilde{\mathcal{L}}_{f}(\theta)$, $\mathcal{L}_{r}^{\text{adj}}(\theta)$ and $\mathcal{L}_{r}^\text{rem}(\theta)$ are twice continuously differentiable to $\theta$. Let $\Delta\theta$ be the update of $\theta$ introduced by eq:update-rule. Then, for sufficiently small $\eta > 0$, we have:

Figures (3)

  • Figure 1: Training dynamics of PGD and cross-entropy loss distributions on $\mathcal{D}_f$. (a) Loss and accuracy curves of PGD during the second stage; (b) Original loss distribution on $\mathcal{D}_f$ after the first stage; (c) Loss distribution on $\mathcal{D}_f$ after applying PGD in the second stage; (d) Loss distribution on $\mathcal{D}_f$ after applying W-PGD. Comparing figure (b) and (c), PGD notably skews the loss distribution, with some samples attaining near-zero loss. In contrast, W-PGD (d) preserves a distribution closer to the original and effectively avoids assigning low loss to forget set samples.
  • Figure 2: Learning dynamics of our method in the second stage on CIFAR100 with ResNet18. The left figure shows the training accuracy while the right figure shows the test accuracy. The adjacent retain set contains all adjacent samples while the remote retain set contains all remote samples.
  • Figure 3: MIA efficacy of different unlearning methods on CIFAR100 using ResNet-18.

Theorems & Definitions (4)

  • Proposition 4.1
  • Proposition 4.2
  • proof : Proof of \ref{['prop:taylor']}
  • proof : Proof of \ref{['prop:w-pgd']}