Table of Contents
Fetching ...

Verifying Robust Unlearning: Probing Residual Knowledge in Unlearned Models

Hao Xuan, Xingyu Li

TL;DR

This paper addresses the risk that machine unlearning (MUL) leaves residual knowledge that can be adversarially recovered. It formalizes Robust Unlearning, requiring that unlearned models remain resistant to resurfacing attempts and remain distinguishable from retrained counterparts, and introduces Unlearning Mapping Attack (UMA) as a post-unlearning verification framework that actively probes for forgotten traces via adversarial inputs. UMA optimizes perturbations to minimize the difference between pre- and post-unlearning outputs, testing whether forgotten information can still be elicited, and experiments show that many state-of-the-art unlearning methods fail to meet this robustness standard across discriminative and generative tasks. The authors also explore defenses, including adversarial unlearning training and test-time purification, demonstrating that robustness can be improved at the cost of computation and potential accuracy trade-offs. Overall, UMA provides a practical tool for evaluating unlearning security and motivates the development of stronger, more resilient unlearning techniques.

Abstract

Machine Unlearning (MUL) is crucial for privacy protection and content regulation, yet recent studies reveal that traces of forgotten information persist in unlearned models, enabling adversaries to resurface removed knowledge. Existing verification methods only confirm whether unlearning was executed, failing to detect such residual information leaks. To address this, we introduce the concept of Robust Unlearning, ensuring models are indistinguishable from retraining and resistant to adversarial recovery. To empirically evaluate whether unlearning techniques meet this security standard, we propose the Unlearning Mapping Attack (UMA), a post-unlearning verification framework that actively probes models for forgotten traces using adversarial queries. Extensive experiments on discriminative and generative tasks show that existing unlearning techniques remain vulnerable, even when passing existing verification metrics. By establishing UMA as a practical verification tool, this study sets a new standard for assessing and enhancing machine unlearning security.

Verifying Robust Unlearning: Probing Residual Knowledge in Unlearned Models

TL;DR

This paper addresses the risk that machine unlearning (MUL) leaves residual knowledge that can be adversarially recovered. It formalizes Robust Unlearning, requiring that unlearned models remain resistant to resurfacing attempts and remain distinguishable from retrained counterparts, and introduces Unlearning Mapping Attack (UMA) as a post-unlearning verification framework that actively probes for forgotten traces via adversarial inputs. UMA optimizes perturbations to minimize the difference between pre- and post-unlearning outputs, testing whether forgotten information can still be elicited, and experiments show that many state-of-the-art unlearning methods fail to meet this robustness standard across discriminative and generative tasks. The authors also explore defenses, including adversarial unlearning training and test-time purification, demonstrating that robustness can be improved at the cost of computation and potential accuracy trade-offs. Overall, UMA provides a practical tool for evaluating unlearning security and motivates the development of stronger, more resilient unlearning techniques.

Abstract

Machine Unlearning (MUL) is crucial for privacy protection and content regulation, yet recent studies reveal that traces of forgotten information persist in unlearned models, enabling adversaries to resurface removed knowledge. Existing verification methods only confirm whether unlearning was executed, failing to detect such residual information leaks. To address this, we introduce the concept of Robust Unlearning, ensuring models are indistinguishable from retraining and resistant to adversarial recovery. To empirically evaluate whether unlearning techniques meet this security standard, we propose the Unlearning Mapping Attack (UMA), a post-unlearning verification framework that actively probes models for forgotten traces using adversarial queries. Extensive experiments on discriminative and generative tasks show that existing unlearning techniques remain vulnerable, even when passing existing verification metrics. By establishing UMA as a practical verification tool, this study sets a new standard for assessing and enhancing machine unlearning security.

Paper Structure

This paper contains 17 sections, 2 theorems, 5 equations, 7 figures, 7 tables, 1 algorithm.

Key Result

Proposition 2

For an unlearned generative system $f_u(\cdot,\theta^u)$ that satisfies conditions in (eq_unlearn_dist_2), there may exist an input $\delta_x \notin \mathcal{D}_u$ s.t. $||f_u(\delta_x,\theta^u)-f(x,\theta)||<\varepsilon_1, \forall x\in \mathcal{D}_u$.

Figures (7)

  • Figure 1: An illustration of the malicious post-MUL attack. With knowledge of the pre- and post-unlearning models, the attacker attempts to recover forgotten information from the unlearned model by injecting carefully-designed noise into the query.
  • Figure 2: Unlearning Mapping Attack on image generation unlearning. I2I li2024machine unlearning method is tested here. Reconstructed images are from ImageNet1k dataset.
  • Figure 3: Unlearning Mapping Attack on image generation unlearning. SalUn fan2024salun unlearning method is tested here. Reconstructed images are from ImageNet1k dataset.
  • Figure 4: Ablation on attack iteration numbers. The experiments are done on CIFAR10 using SalUn fan2024salun as the baseline unlearning algorithm. All experiments have a fixed step size of 1/255 and an attack strength of 16/255.
  • Figure 5: Ablation on attack step size. The experiments are done on CIFAR10 using SalUn fan2024salun as the baseline unlearning algorithm. All experiments have a fixed number of steps of 100 and an attack strength of 16/255.
  • ...and 2 more figures

Theorems & Definitions (4)

  • Definition 1
  • Proposition 2
  • Proposition 3
  • Definition 4