Table of Contents
Fetching ...

Why Does Differential Privacy with Large Epsilon Defend Against Practical Membership Inference Attacks?

Andrew Lowy, Zhuohang Li, Jing Liu, Toshiaki Koike-Akino, Kieran Parsons, Ye Wang

TL;DR

This work investigates why differential privacy with large privacy parameters can defend against practical membership inference attacks. It introduces practical membership privacy (PMP), a subpopulation-focused, average-case privacy notion capturing an attacker who lacks certainty about most private data. The authors derive PMP bounds for the exponential and Gaussian DP mechanisms, showing that large DP parameters can yield small PMP guarantees (e.g., $\tilde{\varepsilon}(X)$ substantially less than $\varepsilon$ for many subpopulations), which helps explain empirical defenses observed in practice. They provide a principled framework and quantitative guidance for practitioners to choose DP parameters by considering the PMP of their data subpopulations, outliers, and problem dimensions, while acknowledging PMP’s limitations in sequential composition and dynamic knowledge. Overall, the results bridge theory and practice by translating DP guarantees into interpretable, data-dependent privacy protection against realistic MIAs.

Abstract

For small privacy parameter $ε$, $ε$-differential privacy (DP) provides a strong worst-case guarantee that no membership inference attack (MIA) can succeed at determining whether a person's data was used to train a machine learning model. The guarantee of DP is worst-case because: a) it holds even if the attacker already knows the records of all but one person in the data set; and b) it holds uniformly over all data sets. In practical applications, such a worst-case guarantee may be overkill: practical attackers may lack exact knowledge of (nearly all of) the private data, and our data set might be easier to defend, in some sense, than the worst-case data set. Such considerations have motivated the industrial deployment of DP models with large privacy parameter (e.g. $ε\geq 7$), and it has been observed empirically that DP with large $ε$ can successfully defend against state-of-the-art MIAs. Existing DP theory cannot explain these empirical findings: e.g., the theoretical privacy guarantees of $ε\geq 7$ are essentially vacuous. In this paper, we aim to close this gap between theory and practice and understand why a large DP parameter can prevent practical MIAs. To tackle this problem, we propose a new privacy notion called practical membership privacy (PMP). PMP models a practical attacker's uncertainty about the contents of the private data. The PMP parameter has a natural interpretation in terms of the success rate of a practical MIA on a given data set. We quantitatively analyze the PMP parameter of two fundamental DP mechanisms: the exponential mechanism and Gaussian mechanism. Our analysis reveals that a large DP parameter often translates into a much smaller PMP parameter, which guarantees strong privacy against practical MIAs. Using our findings, we offer principled guidance for practitioners in choosing the DP parameter.

Why Does Differential Privacy with Large Epsilon Defend Against Practical Membership Inference Attacks?

TL;DR

This work investigates why differential privacy with large privacy parameters can defend against practical membership inference attacks. It introduces practical membership privacy (PMP), a subpopulation-focused, average-case privacy notion capturing an attacker who lacks certainty about most private data. The authors derive PMP bounds for the exponential and Gaussian DP mechanisms, showing that large DP parameters can yield small PMP guarantees (e.g., substantially less than for many subpopulations), which helps explain empirical defenses observed in practice. They provide a principled framework and quantitative guidance for practitioners to choose DP parameters by considering the PMP of their data subpopulations, outliers, and problem dimensions, while acknowledging PMP’s limitations in sequential composition and dynamic knowledge. Overall, the results bridge theory and practice by translating DP guarantees into interpretable, data-dependent privacy protection against realistic MIAs.

Abstract

For small privacy parameter , -differential privacy (DP) provides a strong worst-case guarantee that no membership inference attack (MIA) can succeed at determining whether a person's data was used to train a machine learning model. The guarantee of DP is worst-case because: a) it holds even if the attacker already knows the records of all but one person in the data set; and b) it holds uniformly over all data sets. In practical applications, such a worst-case guarantee may be overkill: practical attackers may lack exact knowledge of (nearly all of) the private data, and our data set might be easier to defend, in some sense, than the worst-case data set. Such considerations have motivated the industrial deployment of DP models with large privacy parameter (e.g. ), and it has been observed empirically that DP with large can successfully defend against state-of-the-art MIAs. Existing DP theory cannot explain these empirical findings: e.g., the theoretical privacy guarantees of are essentially vacuous. In this paper, we aim to close this gap between theory and practice and understand why a large DP parameter can prevent practical MIAs. To tackle this problem, we propose a new privacy notion called practical membership privacy (PMP). PMP models a practical attacker's uncertainty about the contents of the private data. The PMP parameter has a natural interpretation in terms of the success rate of a practical MIA on a given data set. We quantitatively analyze the PMP parameter of two fundamental DP mechanisms: the exponential mechanism and Gaussian mechanism. Our analysis reveals that a large DP parameter often translates into a much smaller PMP parameter, which guarantees strong privacy against practical MIAs. Using our findings, we offer principled guidance for practitioners in choosing the DP parameter.
Paper Structure (16 sections, 17 theorems, 32 equations, 6 figures)

This paper contains 16 sections, 17 theorems, 32 equations, 6 figures.

Key Result

Lemma 3

Let $X \in \mathcal{X}^{2n}$, $x \in X$, $X_{\text{in}}(x) := \{D \subset X : |D| = n, x \in X\}$, and $X_{\text{out}}(x) = \{D \subset X : |D| = n, x \notin X\}$. Let $S \subset \mathcal{Z}$ be a measurable set. If then Also, eq: b holds iff where $N := |X_{\text{in}}(x)| = |X_{\text{out}}(x)| = {2n \choose n}/2$ and the probabilities in eq: c are taken solely over the randomness of $\mathcal{

Figures (6)

  • Figure 1: Ratios vs. $\sigma$, with $1$-dim. data, $n=6$, $m=10$, $C = 10$, $\varepsilon(X) = 5$.
  • Figure 2: Ratios vs. Clip threshold $C$, with $5$-dim. data, $n=6$, $2$ outliers, $m=32$, $\sigma = 1$, $\varepsilon(X) = 10$.
  • Figure 3: Ratios vs. Dimension of data $d$, with $n=6$, $m=10$, $\sigma = 1$, $\varepsilon(X) = 2$.
  • Figure 4: Ratios vs. $\varepsilon(X)$, with $n=100$, $d = 20$, $C=50$, no outliers, $\sigma = 1$, $\delta = 10^{-2}$.
  • Figure 5: Ratios vs. Clip threshold $C$, with $n = 100$, $d = 10$, 2 outliers, $\sigma = 5$, $\delta = 10^{-2}$.
  • ...and 1 more figures

Theorems & Definitions (28)

  • Definition 1: Differential Privacy dwork2006calibrating
  • Definition 2: Practical Membership Privacy
  • Lemma 3
  • Corollary 4
  • Proposition 5
  • Lemma 6
  • Definition 7: Exponential Mechanism
  • Lemma 8
  • Proposition 9
  • Lemma 10
  • ...and 18 more