Machine Unlearning under Retain-Forget Entanglement

Jingpu Cheng; Ping Liu; Qianxiao Li; Chi Zhang

Machine Unlearning under Retain-Forget Entanglement

Jingpu Cheng, Ping Liu, Qianxiao Li, Chi Zhang

Abstract

Forgetting a subset in machine unlearning is rarely an isolated task. Often, retained samples that are closely related to the forget set can be unintentionally affected, particularly when they share correlated features from pretraining or exhibit strong semantic similarities. To address this challenge, we propose a novel two-phase optimization framework specifically designed to handle such retai-forget entanglements. In the first phase, an augmented Lagrangian method increases the loss on the forget set while preserving accuracy on less-related retained samples. The second phase applies a gradient projection step, regularized by the Wasserstein-2 distance, to mitigate performance degradation on semantically related retained samples without compromising the unlearning objective. We validate our approach through comprehensive experiments on multiple unlearning tasks, standard benchmark datasets, and diverse neural architectures, demonstrating that it achieves effective and reliable unlearning while outperforming existing baselines in both accuracy retention and removal fidelity.

Machine Unlearning under Retain-Forget Entanglement

Abstract

Machine Unlearning under Retain-Forget Entanglement

Abstract

Paper Structure

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (4)