Table of Contents
Fetching ...

Verification of Machine Unlearning is Fragile

Binchi Zhang, Zihan Chen, Cong Shen, Jundong Li

TL;DR

The paper investigates the safety of machine unlearning verification and shows that current strategies are fragile against dishonest model providers. It introduces two adversarial unlearning methods—one retraining-based and one forging-based—that circumvent both backdoor and reproducing verification while preserving evidence of unlearned data, supported by theoretical guarantees and empirical results on real datasets. The work demonstrates that verification alone cannot guarantee genuine unlearning and highlights notable trade-offs in model utility and efficiency for the attacker. It calls for developing stronger verification mechanisms to ensure trustworthy unlearning in practice and discusses potential defense directions.

Abstract

As privacy concerns escalate in the realm of machine learning, data owners now have the option to utilize machine unlearning to remove their data from machine learning models, following recent legislation. To enhance transparency in machine unlearning and avoid potential dishonesty by model providers, various verification strategies have been proposed. These strategies enable data owners to ascertain whether their target data has been effectively unlearned from the model. However, our understanding of the safety issues of machine unlearning verification remains nascent. In this paper, we explore the novel research question of whether model providers can circumvent verification strategies while retaining the information of data supposedly unlearned. Our investigation leads to a pessimistic answer: \textit{the verification of machine unlearning is fragile}. Specifically, we categorize the current verification strategies regarding potential dishonesty among model providers into two types. Subsequently, we introduce two novel adversarial unlearning processes capable of circumventing both types. We validate the efficacy of our methods through theoretical analysis and empirical experiments using real-world datasets. This study highlights the vulnerabilities and limitations in machine unlearning verification, paving the way for further research into the safety of machine unlearning.

Verification of Machine Unlearning is Fragile

TL;DR

The paper investigates the safety of machine unlearning verification and shows that current strategies are fragile against dishonest model providers. It introduces two adversarial unlearning methods—one retraining-based and one forging-based—that circumvent both backdoor and reproducing verification while preserving evidence of unlearned data, supported by theoretical guarantees and empirical results on real datasets. The work demonstrates that verification alone cannot guarantee genuine unlearning and highlights notable trade-offs in model utility and efficiency for the attacker. It calls for developing stronger verification mechanisms to ensure trustworthy unlearning in practice and discusses potential defense directions.

Abstract

As privacy concerns escalate in the realm of machine learning, data owners now have the option to utilize machine unlearning to remove their data from machine learning models, following recent legislation. To enhance transparency in machine unlearning and avoid potential dishonesty by model providers, various verification strategies have been proposed. These strategies enable data owners to ascertain whether their target data has been effectively unlearned from the model. However, our understanding of the safety issues of machine unlearning verification remains nascent. In this paper, we explore the novel research question of whether model providers can circumvent verification strategies while retaining the information of data supposedly unlearned. Our investigation leads to a pessimistic answer: \textit{the verification of machine unlearning is fragile}. Specifically, we categorize the current verification strategies regarding potential dishonesty among model providers into two types. Subsequently, we introduce two novel adversarial unlearning processes capable of circumventing both types. We validate the efficacy of our methods through theoretical analysis and empirical experiments using real-world datasets. This study highlights the vulnerabilities and limitations in machine unlearning verification, paving the way for further research into the safety of machine unlearning.
Paper Structure (42 sections, 4 theorems, 32 equations, 8 figures, 8 tables, 2 algorithms)

This paper contains 42 sections, 4 theorems, 32 equations, 8 figures, 8 tables, 2 algorithms.

Key Result

Proposition 4.2

alg:adv1 returns a valid Proof of Retraining under the threshold $\varepsilon=0$.

Figures (8)

  • Figure 1: The connection of our threat model and different verification strategies. Our retraining method can deceive the backdoor and reproducing verification, and our forging method can only deceive a subset of reproducing verification but with better efficiency.
  • Figure 2: An illustration of the retraining-based adversarial unlearning framework. The PoRT is generated based on the retraining process where the mini-batch $d_r^{(t)}\in\mathcal{D}\backslash\mathcal{D}_u$ sampling is guided by the similarity with $d^{(t)}\in\mathcal{D}$ in gradient.
  • Figure 3: An illustration of the forging-based adversarial unlearning framework. Different from the retraining-based adversarial method, the PoRT here is generated directly from the PoT recorded in the original training. $\bm{w}_r^{(t)}$ (with $d_r^{(t)}$) is obtained by conducting the forging map over the PoT instead of using the model updating function $g_r^{(t)}$.
  • Figure 4: Verification error of forging-based adversarial unlearning method for MLP over MNIST, CNN over CIFAR-10, and ResNet over SVHN.
  • Figure 5: Comparison of execution time among original training, naive retraining, and adversarial unlearning methods over three real-world datasets.
  • ...and 3 more figures

Theorems & Definitions (9)

  • Definition 4.1
  • Proposition 4.2
  • Proposition 4.3
  • Proposition 4.4
  • Proposition 4.5
  • proof
  • proof
  • proof
  • proof