Table of Contents
Fetching ...

Reconstruction Attacks on Machine Unlearning: Simple Models are Vulnerable

Martin Bertran, Shuai Tang, Michael Kearns, Jamie Morgenstern, Aaron Roth, Zhiwei Steven Wu

TL;DR

The paper investigates privacy risks when performing machine unlearning, showing that removing a training point can create a vulnerability even for simple models such as linear regression. It develops a reconstruction attack (HRec) that exploits changes in model parameters before and after deletion, linking those changes to the deleted data via gradient/Hessian relationships and public-data covariance approximations. The authors extend the attack to fixed embeddings and arbitrary loss functions using Newton-type updates, and validate it across image and tabular tasks with multiple model configurations, achieving near-perfect reconstructions in many settings. The work highlights a significant privacy risk in unlearning procedures and argues for defenses like differential privacy to ensure data autonomy does not come at the cost of exposing individuals’ data through model updates.

Abstract

Machine unlearning is motivated by desire for data autonomy: a person can request to have their data's influence removed from deployed models, and those models should be updated as if they were retrained without the person's data. We show that, counter-intuitively, these updates expose individuals to high-accuracy reconstruction attacks which allow the attacker to recover their data in its entirety, even when the original models are so simple that privacy risk might not otherwise have been a concern. We show how to mount a near-perfect attack on the deleted data point from linear regression models. We then generalize our attack to other loss functions and architectures, and empirically demonstrate the effectiveness of our attacks across a wide range of datasets (capturing both tabular and image data). Our work highlights that privacy risk is significant even for extremely simple model classes when individuals can request deletion of their data from the model.

Reconstruction Attacks on Machine Unlearning: Simple Models are Vulnerable

TL;DR

The paper investigates privacy risks when performing machine unlearning, showing that removing a training point can create a vulnerability even for simple models such as linear regression. It develops a reconstruction attack (HRec) that exploits changes in model parameters before and after deletion, linking those changes to the deleted data via gradient/Hessian relationships and public-data covariance approximations. The authors extend the attack to fixed embeddings and arbitrary loss functions using Newton-type updates, and validate it across image and tabular tasks with multiple model configurations, achieving near-perfect reconstructions in many settings. The work highlights a significant privacy risk in unlearning procedures and argues for defenses like differential privacy to ensure data autonomy does not come at the cost of exposing individuals’ data through model updates.

Abstract

Machine unlearning is motivated by desire for data autonomy: a person can request to have their data's influence removed from deployed models, and those models should be updated as if they were retrained without the person's data. We show that, counter-intuitively, these updates expose individuals to high-accuracy reconstruction attacks which allow the attacker to recover their data in its entirety, even when the original models are so simple that privacy risk might not otherwise have been a concern. We show how to mount a near-perfect attack on the deleted data point from linear regression models. We then generalize our attack to other loss functions and architectures, and empirically demonstrate the effectiveness of our attacks across a wide range of datasets (capturing both tabular and image data). Our work highlights that privacy risk is significant even for extremely simple model classes when individuals can request deletion of their data from the model.
Paper Structure (20 sections, 16 equations, 14 figures, 2 tables, 1 algorithm)

This paper contains 20 sections, 16 equations, 14 figures, 2 tables, 1 algorithm.

Figures (14)

  • Figure 1: We conduct membership inference attacks on a ridge regression on ACS Income task; the attack performance is poor (close to random guessing).
  • Figure 2: CIFAR10 samples reconstructed from a logistic regression model over a random Fourier feature embedding ($4096$) of the raw input. We randomly chose one deleted sample per label (Row 1) and compared them against the reconstructed sample using our method (HRec, Row 2) and a perturbation baseline (MaxDiff, Row 3) which searches for the public sample with the largest prediction difference before and after sample deletion. HRec produces reconstructions similar to the deleted images both visually and quantitatively measured by cosine similarity.
  • Figure 3: Cumulative distribution function of cosine similarity between deleted and reconstructed sample via the average, MaxDiff, and HRec (our) attack on MNIST, Fashion MNIST and CIFAR10 for three model architectures (linear cross-entropy, ridge regression over $4096$ random Fourier features, and cross-entropy over $4096$ random Fourier features). Lower curves correspond to more effective attacks than higher curves. Our attack achieves better cosine similarity with the deleted sample across all settings; the effect is especially apparent in the denser CIFAR10 dataset.
  • Figure 4: Sample reconstructions on Fashion MNIST/ MNIST for a $40K$ parameter model (cross-entropy over random Fourier features of the raw input). We randomly chose one deleted sample per label (Rows 1, 4) and compared them against the reconstructed sample using our method (HRec, Rows 2, 5) and a perturbation baseline (MaxDiff, Rows 3, 6) which searches for the public sample with the largest prediction difference before and after sample deletion. HRec produces reconstructions that are highly similar to the deleted images.
  • Figure 5: ACS Income Regression. Target models are ridge regression with tuned hyperparameters on original features (first row), and over random Fourier features (second row). ACS Income data from three states are used to demonstrate the effectiveness of our attack. Given the analytical single-sample update rules of linear regression, our attack (HRec) reconstructs the deleted sample almost perfectly on all datasets and different embedding functions.
  • ...and 9 more figures