Table of Contents
Fetching ...

ImpMIA: Leveraging Implicit Bias for Membership Inference Attack under Realistic Scenarios

Yuval Golbari, Navve Wasserman, Gal Vardi, Michal Irani

TL;DR

ImpMIA addresses a practical gap in membership inference by exploiting the implicit bias of gradient descent in white-box settings, removing dependence on reference models and their often unrealistic assumptions. By representing the trained parameter vector as a sum of margin-gradients weighted by per-sample coefficients, ImpMIA optimizes these coefficients over a candidate superset to identify training members. The method demonstrates state-of-the-art performance in realistic scenarios with unknown training configurations, data distributions, and member ratios, especially at ultra-low false-positive rates, across CIFAR-10, CIFAR-100, and CINIC-10 using ResNet-18. This work advances privacy auditing for publicly released models and highlights a practical privacy vulnerability enabled by implicit bias, with broader implications for model release and defense strategies.

Abstract

Determining which data samples were used to train a model-known as Membership Inference Attack (MIA)-is a well-studied and important problem with implications for data privacy. Black-box methods presume access only to the model's outputs and often rely on training auxiliary reference models. While they have shown strong empirical performance, they rely on assumptions that rarely hold in real-world settings: (i) the attacker knows the training hyperparameters; (ii) all available non-training samples come from the same distribution as the training data; and (iii) the fraction of training data in the evaluation set is known. In this paper, we demonstrate that removing these assumptions leads to a significant drop in the performance of black-box attacks. We introduce ImpMIA, a Membership Inference Attack that exploits the Implicit Bias of neural networks, hence removes the need to rely on any reference models and their assumptions. ImpMIA is a white-box attack -- a setting which assumes access to model weights and is becoming increasingly realistic given that many models are publicly available (e.g., via Hugging Face). Building on maximum-margin implicit bias theory, ImpMIA uses the Karush-Kuhn-Tucker (KKT) optimality conditions to identify training samples. This is done by finding the samples whose gradients most strongly reconstruct the trained model's parameters. As a result, ImpMIA achieves state-of-the-art performance compared to both black and white box attacks in realistic settings where only the model weights and a superset of the training data are available.

ImpMIA: Leveraging Implicit Bias for Membership Inference Attack under Realistic Scenarios

TL;DR

ImpMIA addresses a practical gap in membership inference by exploiting the implicit bias of gradient descent in white-box settings, removing dependence on reference models and their often unrealistic assumptions. By representing the trained parameter vector as a sum of margin-gradients weighted by per-sample coefficients, ImpMIA optimizes these coefficients over a candidate superset to identify training members. The method demonstrates state-of-the-art performance in realistic scenarios with unknown training configurations, data distributions, and member ratios, especially at ultra-low false-positive rates, across CIFAR-10, CIFAR-100, and CINIC-10 using ResNet-18. This work advances privacy auditing for publicly released models and highlights a practical privacy vulnerability enabled by implicit bias, with broader implications for model release and defense strategies.

Abstract

Determining which data samples were used to train a model-known as Membership Inference Attack (MIA)-is a well-studied and important problem with implications for data privacy. Black-box methods presume access only to the model's outputs and often rely on training auxiliary reference models. While they have shown strong empirical performance, they rely on assumptions that rarely hold in real-world settings: (i) the attacker knows the training hyperparameters; (ii) all available non-training samples come from the same distribution as the training data; and (iii) the fraction of training data in the evaluation set is known. In this paper, we demonstrate that removing these assumptions leads to a significant drop in the performance of black-box attacks. We introduce ImpMIA, a Membership Inference Attack that exploits the Implicit Bias of neural networks, hence removes the need to rely on any reference models and their assumptions. ImpMIA is a white-box attack -- a setting which assumes access to model weights and is becoming increasingly realistic given that many models are publicly available (e.g., via Hugging Face). Building on maximum-margin implicit bias theory, ImpMIA uses the Karush-Kuhn-Tucker (KKT) optimality conditions to identify training samples. This is done by finding the samples whose gradients most strongly reconstruct the trained model's parameters. As a result, ImpMIA achieves state-of-the-art performance compared to both black and white box attacks in realistic settings where only the model weights and a superset of the training data are available.

Paper Structure

This paper contains 32 sections, 11 equations, 5 figures, 10 tables.

Figures (5)

  • Figure 1: Overview of the approach.(a) Setting: Given a trained model with parameters $\theta$ and a candidate superset containing training members (blue) and non-members (orange), the adversary’s goal is to identify which samples are members. (b) KKT conditions: Our attack builds on implicit bias theory, which shows that gradient-based optimization converges to solutions satisfying the Karush–Kuhn–Tucker (KKT) conditions of the maximum-margin problem. Since weights are known and gradients are computable, only the coefficients remain unknown. (c) ImpMIA: We optimize one coefficient per sample to best reconstruct the model parameters, where members are expected to receive large coefficients and non-members small ones.
  • Figure 2: Lambda Scores Visualization. Scatter plot of superset samples, with the x-axis showing distance from the decision boundary and the y-axis showing $\lambda$ scores; points are colored by membership (member vs. non-member). High $\lambda$ strongly indicates membership.
  • Figure 3: Effect of Assumption Removal. Performance of the best prior method (LiRA) and our method (ImpMIA) under the progressive removal of assumptions on CINIC-10. LiRA degrades sharply as assumptions are removed, while ImpMIA maintains stable performance.
  • Figure S1: Lambda Scores Visualization (CIFAR-100). Scatter plots for six different classes. Each plot shows superset samples; x-axis is distance from the decision boundary, y-axis is the $\lambda$ score, and points are colored by membership (member vs. non-member). High $\lambda$ values strongly indicate membership.
  • Figure S2: TPR--FPR plots for the no-assumption combined setting. These curves illustrate attack performance when the attacker faces realistic uncertainty: (i) training hyperparameters are unknown, (ii) the candidate pool mixes in- and out-of-distribution samples (distribution shift), and (iii) the fraction of members is unknown. The plots complement the main text by showing the full ROC behavior, especially in the low-FPR regime.