Table of Contents
Fetching ...

Towards Probabilistic Verification of Machine Unlearning

David Marco Sommer, Liwei Song, Sameer Wagh, Prateek Mittal

TL;DR

This work addresses the challenge of verifying right-to-be-forgotten deletion in MLaaS by framing unlearning verification as a hypothesis test and leveraging user-specific backdoors as verifiable traces. It introduces a rigorous theoretical framework that yields a closed-form deletion confidence ρ based on backdoor success rates p and q, and demonstrates high-confidence detection of non-compliance with modest participation (as low as 5%) and limited test queries. The approach is validated across five datasets and four neural architectures, and remains effective even under adaptive backdoor defenses, though performance degrades with more complex data and defense strategies. Together, the framework, backdoor-based mechanism, and extensive empirical results provide a quantitative foundation for verifiable machine unlearning with potential regulatory and practical impact in MLaaS systems.

Abstract

The right to be forgotten, also known as the right to erasure, is the right of individuals to have their data erased from an entity storing it. The status of this long held notion was legally solidified recently by the General Data Protection Regulation (GDPR) in the European Union. Consequently, there is a need for mechanisms whereby users can verify if service providers comply with their deletion requests. In this work, we take the first step in proposing a formal framework to study the design of such verification mechanisms for data deletion requests -- also known as machine unlearning -- in the context of systems that provide machine learning as a service (MLaaS). Our framework allows the rigorous quantification of any verification mechanism based on standard hypothesis testing. Furthermore, we propose a novel backdoor-based verification mechanism and demonstrate its effectiveness in certifying data deletion with high confidence, thus providing a basis for quantitatively inferring machine unlearning. We evaluate our approach over a range of network architectures such as multi-layer perceptrons (MLP), convolutional neural networks (CNN), residual networks (ResNet), and long short-term memory (LSTM), as well as over 5 different datasets. We demonstrate that our approach has minimal effect on the ML service's accuracy but provides high confidence verification of unlearning. Our proposed mechanism works even if only a handful of users employ our system to ascertain compliance with data deletion requests. In particular, with just 5% of users participating, modifying half their data with a backdoor, and with merely 30 test queries, our verification mechanism has both false positive and false negative ratios below $10^{-3}$. We also show the effectiveness of our approach by testing it against an adaptive adversary that uses a state-of-the-art backdoor defense method.

Towards Probabilistic Verification of Machine Unlearning

TL;DR

This work addresses the challenge of verifying right-to-be-forgotten deletion in MLaaS by framing unlearning verification as a hypothesis test and leveraging user-specific backdoors as verifiable traces. It introduces a rigorous theoretical framework that yields a closed-form deletion confidence ρ based on backdoor success rates p and q, and demonstrates high-confidence detection of non-compliance with modest participation (as low as 5%) and limited test queries. The approach is validated across five datasets and four neural architectures, and remains effective even under adaptive backdoor defenses, though performance degrades with more complex data and defense strategies. Together, the framework, backdoor-based mechanism, and extensive empirical results provide a quantitative foundation for verifiable machine unlearning with potential regulatory and practical impact in MLaaS systems.

Abstract

The right to be forgotten, also known as the right to erasure, is the right of individuals to have their data erased from an entity storing it. The status of this long held notion was legally solidified recently by the General Data Protection Regulation (GDPR) in the European Union. Consequently, there is a need for mechanisms whereby users can verify if service providers comply with their deletion requests. In this work, we take the first step in proposing a formal framework to study the design of such verification mechanisms for data deletion requests -- also known as machine unlearning -- in the context of systems that provide machine learning as a service (MLaaS). Our framework allows the rigorous quantification of any verification mechanism based on standard hypothesis testing. Furthermore, we propose a novel backdoor-based verification mechanism and demonstrate its effectiveness in certifying data deletion with high confidence, thus providing a basis for quantitatively inferring machine unlearning. We evaluate our approach over a range of network architectures such as multi-layer perceptrons (MLP), convolutional neural networks (CNN), residual networks (ResNet), and long short-term memory (LSTM), as well as over 5 different datasets. We demonstrate that our approach has minimal effect on the ML service's accuracy but provides high confidence verification of unlearning. Our proposed mechanism works even if only a handful of users employ our system to ascertain compliance with data deletion requests. In particular, with just 5% of users participating, modifying half their data with a backdoor, and with merely 30 test queries, our verification mechanism has both false positive and false negative ratios below . We also show the effectiveness of our approach by testing it against an adaptive adversary that uses a state-of-the-art backdoor defense method.

Paper Structure

This paper contains 22 sections, 2 theorems, 13 equations, 8 figures, 4 tables.

Key Result

Theorem 1

For a given ML-mechanism $A$ and a given acceptable Type I error probability $\alpha$, the deletion confidence $\rho_{A, \alpha}(s, n)$ is given by the following expression: where $p,q$ are as given by eq:pandq and $H(\cdot)$ is the heavy-side step function, i.e., $H(x) = 1$ if $x$ is $\mathsf{True}$ and $0$ otherwise.

Figures (8)

  • Figure 1: (Overall system operation.) First, users inject backdoor samples over which the server trains the model. At a later stage, users leverage model predictions on backdoored test samples to detect whether the server followed their deletion requests or not -- as shown by the difference between \ref{['fig:system_H0']} and \ref{['fig:system_H1']}.
  • Figure 2: This figure shows intuitively the relation between the threshold $t$ and the Type I ($\alpha$) and Type II ($\beta$) errors for number of measured samples $n=5$, with $q=0.1$, and $p=0.8$
  • Figure 3: Our backdoor-based machine unlearning verification results with a non-adaptive server. Each row of plots is evaluated on the data-set specified at the most-left position. Each column of plots depicts the evaluation indicated in the caption at its bottom. The colored areas in columns (a) and (c) tag the 10% to 90% quantiles.
  • Figure 4: Our backdoor-based machine unlearning verification results with the Adaptive server. Each row of plots is evaluated on the data-set specified at the most-left position. Each column of plots depicts the evaluation indicated in the caption at its bottom. The AG News dataset is omitted as Neural Cleanse is not applicable for non-continuous datasets. The colored areas in columns (a) and (c) tag the 10% to 90% quantiles.
  • Figure 5: The CDFs of backdoor attack accuracy for deleted and undeleted users for different datasets ($f_{\mathsf{user}}=0.05$, $f_{\mathsf{data}} = 50\%$).
  • ...and 3 more figures

Theorems & Definitions (5)

  • Theorem 1
  • Definition 1
  • Lemma 1: Measured backdoor success rate
  • proof
  • proof : Proof of \ref{['thm:computerho']}