Reconstruction Attacks on Machine Unlearning: Simple Models are Vulnerable
Martin Bertran, Shuai Tang, Michael Kearns, Jamie Morgenstern, Aaron Roth, Zhiwei Steven Wu
TL;DR
The paper investigates privacy risks when performing machine unlearning, showing that removing a training point can create a vulnerability even for simple models such as linear regression. It develops a reconstruction attack (HRec) that exploits changes in model parameters before and after deletion, linking those changes to the deleted data via gradient/Hessian relationships and public-data covariance approximations. The authors extend the attack to fixed embeddings and arbitrary loss functions using Newton-type updates, and validate it across image and tabular tasks with multiple model configurations, achieving near-perfect reconstructions in many settings. The work highlights a significant privacy risk in unlearning procedures and argues for defenses like differential privacy to ensure data autonomy does not come at the cost of exposing individuals’ data through model updates.
Abstract
Machine unlearning is motivated by desire for data autonomy: a person can request to have their data's influence removed from deployed models, and those models should be updated as if they were retrained without the person's data. We show that, counter-intuitively, these updates expose individuals to high-accuracy reconstruction attacks which allow the attacker to recover their data in its entirety, even when the original models are so simple that privacy risk might not otherwise have been a concern. We show how to mount a near-perfect attack on the deleted data point from linear regression models. We then generalize our attack to other loss functions and architectures, and empirically demonstrate the effectiveness of our attacks across a wide range of datasets (capturing both tabular and image data). Our work highlights that privacy risk is significant even for extremely simple model classes when individuals can request deletion of their data from the model.
