Table of Contents
Fetching ...

Verifiable and Provably Secure Machine Unlearning

Thorsten Eisenhofer, Doreen Riepel, Varun Chandrasekaran, Esha Ghosh, Olga Ohrimenko, Nicolas Papernot

TL;DR

This work recasts machine unlearning as a cryptographic security problem and introduces a formal, iteration-based framework for verifiable unlearning. It provides a general abstraction via admissible functions and proves completeness and security for any instantiation; it then presents a practical SNARK- and hash-chain-based instantiation implemented with Spartan, applicable to retraining-based, amnesiac, and optimization-based unlearning across linear and logistic regression and small neural networks. The approach yields verifiable proofs of training, unlearning, and non-membership, enabling auditors to certify that a deleted data point has been removed and not reintroduced. This framework enables auditable data deletion in regulated contexts and offers a path toward scalable, cryptographically sound unlearning across diverse ML paradigms.

Abstract

Machine unlearning aims to remove points from the training dataset of a machine learning model after training: e.g., when a user requests their data to be deleted. While many unlearning methods have been proposed, none of them enable users to audit the procedure. Furthermore, recent work shows a user is unable to verify whether their data was unlearnt from an inspection of the model parameter alone. Rather than reasoning about parameters, we propose to view verifiable unlearning as a security problem. To this end, we present the first cryptographic definition of verifiable unlearning to formally capture the guarantees of an unlearning system. In this framework, the server first computes a proof that the model was trained on a dataset D. Given a user's data point d requested to be deleted, the server updates the model using an unlearning algorithm. It then provides a proof of the correct execution of unlearning and that d is not part of D', where D' is the new training dataset (i.e., d has been removed). Our framework is generally applicable to different unlearning techniques that we abstract as admissible functions. We instantiate a protocol in the framework, based on cryptographic assumptions, using SNARKs and hash chains. Finally, we implement the protocol for three different unlearning techniques and validate its feasibility for linear regression, logistic regression, and neural networks.

Verifiable and Provably Secure Machine Unlearning

TL;DR

This work recasts machine unlearning as a cryptographic security problem and introduces a formal, iteration-based framework for verifiable unlearning. It provides a general abstraction via admissible functions and proves completeness and security for any instantiation; it then presents a practical SNARK- and hash-chain-based instantiation implemented with Spartan, applicable to retraining-based, amnesiac, and optimization-based unlearning across linear and logistic regression and small neural networks. The approach yields verifiable proofs of training, unlearning, and non-membership, enabling auditors to certify that a deleted data point has been removed and not reintroduced. This framework enables auditable data deletion in regulated contexts and offers a path toward scalable, cryptographically sound unlearning across diverse ML paradigms.

Abstract

Machine unlearning aims to remove points from the training dataset of a machine learning model after training: e.g., when a user requests their data to be deleted. While many unlearning methods have been proposed, none of them enable users to audit the procedure. Furthermore, recent work shows a user is unable to verify whether their data was unlearnt from an inspection of the model parameter alone. Rather than reasoning about parameters, we propose to view verifiable unlearning as a security problem. To this end, we present the first cryptographic definition of verifiable unlearning to formally capture the guarantees of an unlearning system. In this framework, the server first computes a proof that the model was trained on a dataset D. Given a user's data point d requested to be deleted, the server updates the model using an unlearning algorithm. It then provides a proof of the correct execution of unlearning and that d is not part of D', where D' is the new training dataset (i.e., d has been removed). Our framework is generally applicable to different unlearning techniques that we abstract as admissible functions. We instantiate a protocol in the framework, based on cryptographic assumptions, using SNARKs and hash chains. Finally, we implement the protocol for three different unlearning techniques and validate its feasibility for linear regression, logistic regression, and neural networks.
Paper Structure (24 sections, 2 theorems, 9 equations, 4 figures, 2 tables)

This paper contains 24 sections, 2 theorems, 9 equations, 4 figures, 2 tables.

Key Result

Theorem 1

Let $\Pi$ be a complete SNARK and $\mathsf{Hash}$ a collision-resistant hash function. Then the instantiated protocol satisfies completeness.

Figures (4)

  • Figure 1: Unlearning Framework. We describe protocols in this framework based on a set of admissible functions $f$. After initialization, execution proceeds in iterations. In the beginning of each iteration $i$, users $\mathcal{U}$ can issue requests for data to be added or deleted. After this phase, the server $S$ either performs a proof of training by adding the requested data records in $\dataAdd[i]$ to the model or a proof of unlearning by removing the requested data records in $\dataUnlearnAdd[i]$. It computes a commitment $\com[i]$ on the updated model $\model[i]$ and updated training dataset. Furthermore, the server computes a proof $\proofModel[i]$ that $\model[i]$ was obtained from this dataset. The users verify this proof and the commitment. In each iteration of unlearning the server additionally creates a proof of non-membership for every unlearnt data point conforming to a user that it has complied with a data deletion request. This proof can be verified by the user against $\com[i]$.
  • Figure 2: Security Game. We define the security of an protocol $\Phi_f$ in terms of game $\mathsf{GameUnlearn}$. The notation $(\mathcal{A}\|\mathcal{E})$ denotes that both algorithms are run on the same input and random coins and assigning their results to variables before resp. after the semicolon. Input $\mathsf{aux}$ refers to auxiliary input.
  • Figure 3: Circuits $C_{U}$. Based on the circuit, we prove correct execution of admissible functions for the proof of unlearning.
  • Figure 4: Games $\game[0]$-$\game[2]$ for the proof of \ref{['thm:security']}. We prove unlearning security for our instantiated protocol $\Phi_f$ in Appendix \ref{['app:protocol']}, where $f=(f_{I},f_{T},f_{U})$ and hyperparameter $\mathsf{pp}_f$ are fixed by the participating parties and determine relations $R_{I}$, $R_{T}$ and $R_{U}$.

Theorems & Definitions (8)

  • Definition 1: Completeness
  • Definition 2: Unlearning
  • Theorem 1
  • proof : Proof (Sketch)
  • Theorem 2
  • proof : Proof (Sketch)
  • proof : Proof (of \ref{['thm:completeness']})
  • proof : Proof (of \ref{['thm:security']})