Fundamental Limits of Membership Inference Attacks on Machine Learning Models

Eric Aubinais; Elisabeth Gassiat; Pablo Piantanida

Fundamental Limits of Membership Inference Attacks on Machine Learning Models

Eric Aubinais, Elisabeth Gassiat, Pablo Piantanida

TL;DR

This paper develops a theory for membership inference attacks (MIAs) that is model-agnostic and statistically grounded. It introduces the central quantity $\Delta_{\nu,\lambda,n}(P,\mathcal{A})$, an $f$-divergence that bounds the best possible MIA accuracy and thus defines Membership Inference Security (MIS) for a learning procedure. The authors prove that overfitting dramatically degrades security, while for empirical-mean learners and discrete data, $\Delta_{\nu,\lambda,n}(P,\mathcal{A})$ can be controlled with explicit rates, notably $O(n^{-1/2})$, and that data discretization via $C_K(P)$ can improve privacy without heavily sacrificing accuracy. Numerical experiments corroborate the theory, showing that overfitting enables highly effective MIAs and that discretization reduces leakage, providing practical guidance for privacy-aware data analysis and learning. The work also clarifies interactions with differential privacy, positioning MIS as a complementary, attack-centric privacy metric rather than a DP guarantee.

Abstract

Membership inference attacks (MIA) can reveal whether a particular data point was part of the training dataset, potentially exposing sensitive information about individuals. This article provides theoretical guarantees by exploring the fundamental statistical limitations associated with MIAs on machine learning models at large. More precisely, we first derive the statistical quantity that governs the effectiveness and success of such attacks. We then theoretically prove that in a non-linear regression setting with overfitting learning procedures, attacks may have a high probability of success. Finally, we investigate several situations for which we provide bounds on this quantity of interest. Interestingly, our findings indicate that discretizing the data might enhance the learning procedure's security. Specifically, it is demonstrated to be limited by a constant, which quantifies the diversity of the underlying data distribution. We illustrate those results through simple simulations.

Fundamental Limits of Membership Inference Attacks on Machine Learning Models

TL;DR

This paper develops a theory for membership inference attacks (MIAs) that is model-agnostic and statistically grounded. It introduces the central quantity

, an

-divergence that bounds the best possible MIA accuracy and thus defines Membership Inference Security (MIS) for a learning procedure. The authors prove that overfitting dramatically degrades security, while for empirical-mean learners and discrete data,

can be controlled with explicit rates, notably

, and that data discretization via

can improve privacy without heavily sacrificing accuracy. Numerical experiments corroborate the theory, showing that overfitting enables highly effective MIAs and that discretization reduces leakage, providing practical guidance for privacy-aware data analysis and learning. The work also clarifies interactions with differential privacy, positioning MIS as a complementary, attack-centric privacy metric rather than a DP guarantee.

Abstract

Paper Structure (27 sections, 23 theorems, 156 equations, 2 figures, 1 table)

This paper contains 27 sections, 23 theorems, 156 equations, 2 figures, 1 table.

Introduction
Contributions
Related Works
Background and Problem Setup
Performance Assessment of Membership Inference Attacks
Overfitting Causes Lack of Security
Security is Data Size Dependent
Empirical Mean based Learning Procedures
Discrete Data Distribution
Numerical Experiments
Overfitting
Impact of $C_K(P)$ on accuracy
Summary and Discussion
More comments on Section \ref{['prob_form']}
More comments on Overfitting
...and 12 more sections

Key Result

Proposition 5

The map $(P,Q)\mapsto D_\alpha(P,Q)$ is an $f-$divergence between $P$ and $Q$ with as generator the function $f_{\alpha}(x) = \frac{1}{2}\max(1,\alpha)\left[|x-1/\alpha| - |1-1/\alpha|\right]$. Additionally, it holds that If ${\mathnormal{x}}_1$ and ${\mathnormal{x}}_2$ are random variables with joint distribution $\mathbb{P}_{({\mathnormal{x}}_1,{\mathnormal{x}}_2)}$, then for any function $f \i

Figures (2)

Figure 1: Shows the fraction of the Training/Validation dataset whose loss is under given thresholds during the training process. The left figure shows the training accuracy, and the right figure shows the validation accuracy.
Figure 2: Shows the fraction of the Validation dataset whose loss is under given thresholds at the end of the training process for different dimensions. The left figure shows the validation accuracy for the linear regression model , and the right figure shows the validation accuracy for the nearest neighbors model.

Theorems & Definitions (38)

Definition 1: Membership Inference Attack - MIA
Definition 2: Accuracy of an MIA
Definition 3: Membership Inference Security - MIS
Remark 4: Model-specific attack - limitations of this approach
Proposition 5
Theorem 6: Key bound on accuracy
Remark 7: Relation with other divergences
Remark 8: Differential Privacy
Definition 9: $(\varepsilon,1-\alpha)$-Overfitting
Proposition 10
...and 28 more

Fundamental Limits of Membership Inference Attacks on Machine Learning Models

TL;DR

Abstract

Fundamental Limits of Membership Inference Attacks on Machine Learning Models

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (38)