Verifiable and Provably Secure Machine Unlearning
Thorsten Eisenhofer, Doreen Riepel, Varun Chandrasekaran, Esha Ghosh, Olga Ohrimenko, Nicolas Papernot
TL;DR
This work recasts machine unlearning as a cryptographic security problem and introduces a formal, iteration-based framework for verifiable unlearning. It provides a general abstraction via admissible functions and proves completeness and security for any instantiation; it then presents a practical SNARK- and hash-chain-based instantiation implemented with Spartan, applicable to retraining-based, amnesiac, and optimization-based unlearning across linear and logistic regression and small neural networks. The approach yields verifiable proofs of training, unlearning, and non-membership, enabling auditors to certify that a deleted data point has been removed and not reintroduced. This framework enables auditable data deletion in regulated contexts and offers a path toward scalable, cryptographically sound unlearning across diverse ML paradigms.
Abstract
Machine unlearning aims to remove points from the training dataset of a machine learning model after training: e.g., when a user requests their data to be deleted. While many unlearning methods have been proposed, none of them enable users to audit the procedure. Furthermore, recent work shows a user is unable to verify whether their data was unlearnt from an inspection of the model parameter alone. Rather than reasoning about parameters, we propose to view verifiable unlearning as a security problem. To this end, we present the first cryptographic definition of verifiable unlearning to formally capture the guarantees of an unlearning system. In this framework, the server first computes a proof that the model was trained on a dataset D. Given a user's data point d requested to be deleted, the server updates the model using an unlearning algorithm. It then provides a proof of the correct execution of unlearning and that d is not part of D', where D' is the new training dataset (i.e., d has been removed). Our framework is generally applicable to different unlearning techniques that we abstract as admissible functions. We instantiate a protocol in the framework, based on cryptographic assumptions, using SNARKs and hash chains. Finally, we implement the protocol for three different unlearning techniques and validate its feasibility for linear regression, logistic regression, and neural networks.
