Table of Contents
Fetching ...

Beyond Membership: Limitations of Add/Remove Adjacency in Differential Privacy

Gauri Pradhan, Joonas Jälkö, Santiago Zanella-Bèguelin, Antti Honkela

TL;DR

This work argues that the conventional add/remove adjacency used in differential privacy can overstate protection for per-record attributes, motivating substitute adjacency as a more accurate privacy notion for attribute privacy. It introduces canary-based auditing tools to empirically assess DP under substitute adjacency and shows that leakage observed in real models often aligns with substitute-DP budgets rather than add/remove bounds. Through gradient-space and input-space canaries across natural and synthetic datasets, the authors demonstrate that attribute or label privacy can be violated beyond add/remove guarantees, especially under high subsampling. The findings have practical implications for how privacy guarantees are reported and interpreted in DP-enabled ML pipelines, and point to the need for auditing methods that explicitly account for substitute adjacency and attribute leakage.

Abstract

Training machine learning models with differential privacy (DP) limits an adversary's ability to infer sensitive information about the training data. It can be interpreted as a bound on adversary's capability to distinguish two adjacent datasets according to chosen adjacency relation. In practice, most DP implementations use the add/remove adjacency relation, where two datasets are adjacent if one can be obtained from the other by adding or removing a single record, thereby protecting membership. In many ML applications, however, the goal is to protect attributes of individual records (e.g., labels used in supervised fine-tuning). We show that privacy accounting under add/remove overstates attribute privacy compared to accounting under the substitute adjacency relation, which permits substituting one record. To demonstrate this gap, we develop novel attacks to audit DP under substitute adjacency, and show empirically that audit results are inconsistent with DP guarantees reported under add/remove, yet remain consistent with the budget accounted under the substitute adjacency relation. Our results highlight that the choice of adjacency when reporting DP guarantees is critical when the protection target is per-record attributes rather than membership.

Beyond Membership: Limitations of Add/Remove Adjacency in Differential Privacy

TL;DR

This work argues that the conventional add/remove adjacency used in differential privacy can overstate protection for per-record attributes, motivating substitute adjacency as a more accurate privacy notion for attribute privacy. It introduces canary-based auditing tools to empirically assess DP under substitute adjacency and shows that leakage observed in real models often aligns with substitute-DP budgets rather than add/remove bounds. Through gradient-space and input-space canaries across natural and synthetic datasets, the authors demonstrate that attribute or label privacy can be violated beyond add/remove guarantees, especially under high subsampling. The findings have practical implications for how privacy guarantees are reported and interpreted in DP-enabled ML pipelines, and point to the need for auditing methods that explicitly account for substitute adjacency and attribute leakage.

Abstract

Training machine learning models with differential privacy (DP) limits an adversary's ability to infer sensitive information about the training data. It can be interpreted as a bound on adversary's capability to distinguish two adjacent datasets according to chosen adjacency relation. In practice, most DP implementations use the add/remove adjacency relation, where two datasets are adjacent if one can be obtained from the other by adding or removing a single record, thereby protecting membership. In many ML applications, however, the goal is to protect attributes of individual records (e.g., labels used in supervised fine-tuning). We show that privacy accounting under add/remove overstates attribute privacy compared to accounting under the substitute adjacency relation, which permits substituting one record. To demonstrate this gap, we develop novel attacks to audit DP under substitute adjacency, and show empirically that audit results are inconsistent with DP guarantees reported under add/remove, yet remain consistent with the budget accounted under the substitute adjacency relation. Our results highlight that the choice of adjacency when reporting DP guarantees is critical when the protection target is per-record attributes rather than membership.

Paper Structure

This paper contains 27 sections, 2 theorems, 7 equations, 13 figures, 2 tables, 5 algorithms.

Key Result

Theorem 4.1

Any algorithm $\mathcal{M}$ which satisfies ($\varepsilon_{AR},\delta_{AR}, \sim_{AR}$)-DP is ($\varepsilon_{S},\delta_{S}, \sim_{S}$)-DP with $\varepsilon_S = 2\varepsilon_{AR}$ and $\delta_S = (1 + e^{\varepsilon_{AR}})\delta_{AR}$.

Figures (13)

  • Figure 1: Adversary's prior knowledge in each auditing scenario described in \ref{['tab:class_of_attacks']}.
  • Figure 2: Auditing DP using worst-case dataset canaries based on substitute adjacency. When the adversary crafts the neighbouring datasets as worst-case dataset canaries, we find that the empirical privacy leakage for a DP algorithm, $\varepsilon$ (Auditing ), exceeds the privacy upper bound for add/remove DP, $\varepsilon_{AR}$ (Accounting). It closely tracks the privacy budget predicted by substitute accountant, $\varepsilon_S$ (Accounting). The plot shows that $\varepsilon_S$ (Accounting) is tighter when compared to that $\varepsilon_S$ (Group Privacy) computed using \ref{['th:ar_to_s']}. We fix $\delta_{\text{target}} = 10^{-5}, C=1.0$ and $T=500$. The auditing estimates are averaged over $3$ repeats. For each repeat, we use $R=25$K runs to estimate $\varepsilon$ (Auditing) at the final step of training. The error bars represent $\pm 2$ standard errors around the mean computed over $3$ repeats of auditing algorithm.
  • Figure 3: Auditing models trained with DP using natural datasets. We fine-tune final layer of ViT-B-16 models pretrained on ImageNet21K using CIFAR10. The privacy leakage ($\varepsilon$) audited using our proposed canaries for this setting exceeds the add/remove DP upper bounds, $\varepsilon_{AR}$ (Accounting). As these canaries are used to mount a substitute-style attack, the figure shows that add/remove DP overestimates protection against such attacks. Efficacy of the canaries decline as subsampling rate $q$ decreases, the effect being most significant for audits using input-space canaries. We plot $\varepsilon$ for every $k$th step $(k=25)$ of training averaged over 3 repeats of the auditing algorithm. For each repeat, we train $R=2500$ models, $1/2$ trained with $z$ and the remaining with $z'$. The error bars represent $\pm 2$ standard errors around the mean computed over $3$ repeats of auditing algorithm.
  • Figure 4: Auditing MLP model trained from scratch with random initialization using Purchase100. We find that auditing such models using input-space canaries yield weaker audits. We do not observe $\varepsilon$ from such audits to exceed the privacy implied by $\varepsilon_{AR}$ (Accounting). However, using crafted gradient canaries, we still get $\varepsilon$ from auditing which is consistent with $\varepsilon_{S}$ (Accounting). We plot $\varepsilon$ for every $k$th step $(k=125)$ of training. We train $R=2500$ models, $1/2$ trained with $z$ and the remaining with $z'$.
  • Figure 5: Effect of number of training runs $R$ on privacy auditing. For ViT-B-16 models with final layer fine-tuned on CIFAR10 ($T=500, C=2.0$), we record the effect of change in $R$ on the empirical privacy leakage $\hat{\varepsilon}$, at the final step of training. The error bars represent $\pm 2$ standard errors around the mean computed over $3$ repeats of auditing algorithm. In each repeat, $1/2$ of the models are trained with $z$ and the remaining with $z'$.
  • ...and 8 more figures

Theorems & Definitions (3)

  • Definition 1: $(\varepsilon, \delta, \sim)$-Differential Privacy
  • Theorem 4.1: DBLP:journals/fttcs/DworkR14
  • Theorem 5.1: gaussianDP Conversion from $\mu$-GDP to $(\varepsilon,\delta)$-DP