Verification of Machine Unlearning is Fragile

Binchi Zhang; Zihan Chen; Cong Shen; Jundong Li

Verification of Machine Unlearning is Fragile

Binchi Zhang, Zihan Chen, Cong Shen, Jundong Li

TL;DR

The paper investigates the safety of machine unlearning verification and shows that current strategies are fragile against dishonest model providers. It introduces two adversarial unlearning methods—one retraining-based and one forging-based—that circumvent both backdoor and reproducing verification while preserving evidence of unlearned data, supported by theoretical guarantees and empirical results on real datasets. The work demonstrates that verification alone cannot guarantee genuine unlearning and highlights notable trade-offs in model utility and efficiency for the attacker. It calls for developing stronger verification mechanisms to ensure trustworthy unlearning in practice and discusses potential defense directions.

Abstract

As privacy concerns escalate in the realm of machine learning, data owners now have the option to utilize machine unlearning to remove their data from machine learning models, following recent legislation. To enhance transparency in machine unlearning and avoid potential dishonesty by model providers, various verification strategies have been proposed. These strategies enable data owners to ascertain whether their target data has been effectively unlearned from the model. However, our understanding of the safety issues of machine unlearning verification remains nascent. In this paper, we explore the novel research question of whether model providers can circumvent verification strategies while retaining the information of data supposedly unlearned. Our investigation leads to a pessimistic answer: \textit{the verification of machine unlearning is fragile}. Specifically, we categorize the current verification strategies regarding potential dishonesty among model providers into two types. Subsequently, we introduce two novel adversarial unlearning processes capable of circumventing both types. We validate the efficacy of our methods through theoretical analysis and empirical experiments using real-world datasets. This study highlights the vulnerabilities and limitations in machine unlearning verification, paving the way for further research into the safety of machine unlearning.

Verification of Machine Unlearning is Fragile

TL;DR

Abstract

Paper Structure (42 sections, 4 theorems, 32 equations, 8 figures, 8 tables, 2 algorithms)

This paper contains 42 sections, 4 theorems, 32 equations, 8 figures, 8 tables, 2 algorithms.

Introduction
Related Works
Machine Unlearning
Verification for Machine Unlearning
Threat Model
Adversary's Goal.
Adversary's Knowledge.
Methodology
Preliminary
Notation.
Proof of Retraining.
Reproducing Verification.
First Adversarial Method (Retraining)
Second Adversarial Method (Forging)
Experiments
...and 27 more sections

Key Result

Proposition 4.2

alg:adv1 returns a valid Proof of Retraining under the threshold $\varepsilon=0$.

Figures (8)

Figure 1: The connection of our threat model and different verification strategies. Our retraining method can deceive the backdoor and reproducing verification, and our forging method can only deceive a subset of reproducing verification but with better efficiency.
Figure 2: An illustration of the retraining-based adversarial unlearning framework. The PoRT is generated based on the retraining process where the mini-batch $d_r^{(t)}\in\mathcal{D}\backslash\mathcal{D}_u$ sampling is guided by the similarity with $d^{(t)}\in\mathcal{D}$ in gradient.
Figure 3: An illustration of the forging-based adversarial unlearning framework. Different from the retraining-based adversarial method, the PoRT here is generated directly from the PoT recorded in the original training. $\bm{w}_r^{(t)}$ (with $d_r^{(t)}$) is obtained by conducting the forging map over the PoT instead of using the model updating function $g_r^{(t)}$.
Figure 4: Verification error of forging-based adversarial unlearning method for MLP over MNIST, CNN over CIFAR-10, and ResNet over SVHN.
Figure 5: Comparison of execution time among original training, naive retraining, and adversarial unlearning methods over three real-world datasets.
...and 3 more figures

Theorems & Definitions (9)

Definition 4.1
Proposition 4.2
Proposition 4.3
Proposition 4.4
Proposition 4.5
proof
proof
proof
proof

Verification of Machine Unlearning is Fragile

TL;DR

Abstract

Verification of Machine Unlearning is Fragile

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (9)