Table of Contents
Fetching ...

Privacy Risk in Machine Learning: Analyzing the Connection to Overfitting

Samuel Yeom, Irene Giacomelli, Matt Fredrikson, Somesh Jha

TL;DR

The paper formalizes how overfitting and feature influence drive privacy risks in ML, connecting membership and attribute inference through rigorous advantage definitions and reductions. It proves that differential privacy and stability can bound, but not fully eliminate, membership leakage, and it shows how attribute inference can both depend on and imply membership information. Through analytical results and extensive experiments on linear, tree, and CNN models, the work demonstrates practical privacy risks even when models generalize well, and reveals the surprising possibility of collusion and algorithm substitution attacks that leak training data without harming predictive performance. The findings highlight the need for robust defenses beyond overfitting mitigation and provide a framework for evaluating privacy risk in real ML deployments.

Abstract

Machine learning algorithms, when applied to sensitive data, pose a distinct threat to privacy. A growing body of prior work demonstrates that models produced by these algorithms may leak specific private information in the training data to an attacker, either through the models' structure or their observable behavior. However, the underlying cause of this privacy risk is not well understood beyond a handful of anecdotal accounts that suggest overfitting and influence might play a role. This paper examines the effect that overfitting and influence have on the ability of an attacker to learn information about the training data from machine learning models, either through training set membership inference or attribute inference attacks. Using both formal and empirical analyses, we illustrate a clear relationship between these factors and the privacy risk that arises in several popular machine learning algorithms. We find that overfitting is sufficient to allow an attacker to perform membership inference and, when the target attribute meets certain conditions about its influence, attribute inference attacks. Interestingly, our formal analysis also shows that overfitting is not necessary for these attacks and begins to shed light on what other factors may be in play. Finally, we explore the connection between membership inference and attribute inference, showing that there are deep connections between the two that lead to effective new attacks.

Privacy Risk in Machine Learning: Analyzing the Connection to Overfitting

TL;DR

The paper formalizes how overfitting and feature influence drive privacy risks in ML, connecting membership and attribute inference through rigorous advantage definitions and reductions. It proves that differential privacy and stability can bound, but not fully eliminate, membership leakage, and it shows how attribute inference can both depend on and imply membership information. Through analytical results and extensive experiments on linear, tree, and CNN models, the work demonstrates practical privacy risks even when models generalize well, and reveals the surprising possibility of collusion and algorithm substitution attacks that leak training data without harming predictive performance. The findings highlight the need for robust defenses beyond overfitting mitigation and provide a framework for evaluating privacy risk in real ML deployments.

Abstract

Machine learning algorithms, when applied to sensitive data, pose a distinct threat to privacy. A growing body of prior work demonstrates that models produced by these algorithms may leak specific private information in the training data to an attacker, either through the models' structure or their observable behavior. However, the underlying cause of this privacy risk is not well understood beyond a handful of anecdotal accounts that suggest overfitting and influence might play a role. This paper examines the effect that overfitting and influence have on the ability of an attacker to learn information about the training data from machine learning models, either through training set membership inference or attribute inference attacks. Using both formal and empirical analyses, we illustrate a clear relationship between these factors and the privacy risk that arises in several popular machine learning algorithms. We find that overfitting is sufficient to allow an attacker to perform membership inference and, when the target attribute meets certain conditions about its influence, attribute inference attacks. Interestingly, our formal analysis also shows that overfitting is not necessary for these attacks and begins to shed light on what other factors may be in play. Finally, we explore the connection between membership inference and attribute inference, showing that there are deep connections between the two that lead to effective new attacks.

Paper Structure

This paper contains 35 sections, 7 theorems, 32 equations, 6 figures, 1 table.

Key Result

Theorem 1

Let $A$ be an $\epsilon$-differentially private learning algorithm and $\mathcal{A}$ be a membership adversary. Then we have:

Figures (6)

  • Figure 1: The advantage of Adversary \ref{['adv:invlinreg']} as a function of $t$'s influence $\tau$. Here $t$ is a uniformly distributed binary variable.
  • Figure 2: Empirical membership advantage of the threshold adversary (Adversary \ref{['adv:incthreshold']}) given as a function of generalization ratio for regression, tree, and CNN models.
  • Figure 3: Experimentally determined advantage for various membership and attribute adversaries. The plots correspond to: (a) threshold membership adversary (Adversary \ref{['adv:incthreshold']}), (b) uniform reduction adversary (Adversary \ref{['adv:uniforminvtoinc']}), (c) general attribute adversary (Adversary \ref{['adv:invlinreg']}), and (d) multi-query reduction adversary (Adversary \ref{['adv:multiqinvtoinc']}). Both reduction adversaries use the threshold membership adversary as the oracle, and $f_\mathcal{A}\xspace(\epsilon)$ for the attribute adversary is the Gaussian with mean zero and standard deviation $\sigma_S$.
  • Figure 4: Results of colluding training algorithm and membership adversary on CNNs trained on MNIST, CIFAR-10, and CIFAR-100. The size parameter was configured to take values $s=2^i$ for $i \in [0,7]$. Regardless of the models' generalization performance, when the network is sufficiently large, the attack achieves high advantage ($\ge 0.98$) without affecting predictive accuracy.
  • Figure 5: The training and test error distributions for an overfitted Ridge regression model. The histograms are juxtaposed with what we would expect if the errors were normally distributed with standard deviation $R_{emp} = 0.2774$ and $R_{cv} = 0.8884$, respectively. Note the different vertical scale for the two graphs. To minimize the effect of noise, the errors were measured using 1000 different random 75-25 splits of the data into training and test sets and then aggregated.
  • ...and 1 more figures

Theorems & Definitions (20)

  • Definition 1: On-Average-Replace-One (ARO) Stability
  • Definition 2: Differential privacy
  • Definition 3: Average generalization error
  • Definition 4: Membership advantage
  • Theorem 1
  • proof
  • Theorem 2
  • proof
  • Theorem 3
  • proof
  • ...and 10 more