Table of Contents
Fetching ...

Apollo: A Posteriori Label-Only Membership Inference Attack Towards Machine Unlearning

Liou Tang, James Joshi, Ashish Kundu

TL;DR

This work tackles privacy risks in Machine Unlearning (MU) by introducing Apollo, a posteriori label-only Membership Inference Attack that operates under a strict black-box threat model using only the unlearned model’s labels. It formalizes Under-Unlearning and Over-Unlearning as artifacts in the unlearning process and leverages shadow models plus adversarial input generation to detect membership without access to the original model or posteriors, differentiating it from prior MU attacks. Empirical results on CIFAR-10, CIFAR-100, and ImageNet across several MU algorithms show that Apollo achieves high inference precision at low false-positive rates, challenging claims of privacy protection from MU. The findings motivate tighter defenses for MU and point to future research needed to balance data deletion guarantees with robust privacy protections, including both online and offline attack variants and comprehensive ablations.

Abstract

Machine Unlearning (MU) aims to update Machine Learning (ML) models following requests to remove training samples and their influences on a trained model efficiently without retraining the original ML model from scratch. While MU itself has been employed to provide privacy protection and regulatory compliance, it can also increase the attack surface of the model. Existing privacy inference attacks towards MU that aim to infer properties of the unlearned set rely on the weaker threat model that assumes the attacker has access to both the unlearned model and the original model, limiting their feasibility toward real-life scenarios. We propose a novel privacy attack, A Posteriori Label-Only Membership Inference Attack towards MU, Apollo, that infers whether a data sample has been unlearned, following a strict threat model where an adversary has access to the label-output of the unlearned model only. We demonstrate that our proposed attack, while requiring less access to the target model compared to previous attacks, can achieve relatively high precision on the membership status of the unlearned samples.

Apollo: A Posteriori Label-Only Membership Inference Attack Towards Machine Unlearning

TL;DR

This work tackles privacy risks in Machine Unlearning (MU) by introducing Apollo, a posteriori label-only Membership Inference Attack that operates under a strict black-box threat model using only the unlearned model’s labels. It formalizes Under-Unlearning and Over-Unlearning as artifacts in the unlearning process and leverages shadow models plus adversarial input generation to detect membership without access to the original model or posteriors, differentiating it from prior MU attacks. Empirical results on CIFAR-10, CIFAR-100, and ImageNet across several MU algorithms show that Apollo achieves high inference precision at low false-positive rates, challenging claims of privacy protection from MU. The findings motivate tighter defenses for MU and point to future research needed to balance data deletion guarantees with robust privacy protections, including both online and offline attack variants and comprehensive ablations.

Abstract

Machine Unlearning (MU) aims to update Machine Learning (ML) models following requests to remove training samples and their influences on a trained model efficiently without retraining the original ML model from scratch. While MU itself has been employed to provide privacy protection and regulatory compliance, it can also increase the attack surface of the model. Existing privacy inference attacks towards MU that aim to infer properties of the unlearned set rely on the weaker threat model that assumes the attacker has access to both the unlearned model and the original model, limiting their feasibility toward real-life scenarios. We propose a novel privacy attack, A Posteriori Label-Only Membership Inference Attack towards MU, Apollo, that infers whether a data sample has been unlearned, following a strict threat model where an adversary has access to the label-output of the unlearned model only. We demonstrate that our proposed attack, while requiring less access to the target model compared to previous attacks, can achieve relatively high precision on the membership status of the unlearned samples.

Paper Structure

This paper contains 17 sections, 4 theorems, 19 equations, 6 figures, 7 tables, 1 algorithm.

Key Result

Lemma 3.1

Let $m_\theta(x)$ be $L_x$-Lipschitz in $x$ and $L_\theta$-Lipschitz in $\theta$; then, for any $x, x^\prime$ and $\theta, \theta^\prime$, we have:

Figures (6)

  • Figure 1: An overview of our proposed Apollo attack. For the target sample whose membership statuses we are interested in, we generate an adversarial input $x^\prime$ under conjectures of Under-Unlearning and Over-Unlearning; the target model's prediction label on the adversarial input is used to infer whether the target sample $x$ is unlearned.
  • Figure 2: We show the dynamics of MU on the decision boundary: (a) Before unlearning (the original model $\theta$), (b) Under-Unlearning (Conj. \ref{['conj:under']}) and (c) Over-Unlearning (Conj. \ref{['conj:over']}). In each scenario, the solid and dotted line represent the decision boundary of the (approximately) unlearned model $\theta_u$ and a retrained model $\theta_r$, respectively.
  • Figure 3: We test Conj. \ref{['conj:under']} and \ref{['conj:over']} on a $(x, y) \in \mathbb{R}^2 \times \{0, 1, 2, 3\}$ example. In Fig. \ref{['sfig:ToyData']}, the four classes and the unlearned set are identified in the legend. Regions of Under-Unlearning and Over-Unlearning are colored red and green, respectively. In Fig. \ref{['sfig:AdvToyData']}, the trajectory indicates the adversarial input $x^\prime$ at each iteration.
  • Figure 4: Attack ROCs against various unlearning algorithms.
  • Figure 5: Attack TPRs with Under-Unlearning (Conj. \ref{['conj:under']}) and Over-Unlearning (Conj. \ref{['conj:over']}) at different step sizes against various unlearning algorithms.
  • ...and 1 more figures

Theorems & Definitions (12)

  • Definition 1: Machine Learning Chourasia2023TrueTang2024Taxonomy
  • Definition 2: Machine Unlearning Chourasia2023TrueTang2024Taxonomy
  • Definition 3: Membership Inference Game on Machine Unlearning Carlini2022MIAHayes2024Inexact
  • Conjecture 1: Under-Unlearning
  • Conjecture 2: Over-Unlearning
  • Lemma 3.1: Lipschitzness of the Margin
  • proof
  • Lemma 3.2: Certified Label-Invariance Radius
  • Theorem 3.3: Bounds for Under-Unlearning
  • proof
  • ...and 2 more