Table of Contents
Fetching ...

Unifying Re-Identification, Attribute Inference, and Data Reconstruction Risks in Differential Privacy

Bogdan Kulynych, Juan Felipe Gomez, Georgios Kaissis, Jamie Hayes, Borja Balle, Flavio P. Calmon, Jean Louis Raisaro

TL;DR

This work reframes differential privacy through the hypothesis-testing lens of $f$-DP to unify three operational privacy risks: re-identification, attribute inference, and data reconstruction. It derives a single, tight bound that applies across these risks under a strong adversary model and allows risk assessment to be tuned to arbitrary baseline risk levels. The framework supports practical computation for complex DP mechanisms (e.g., DP-SGD) and demonstrates tangible utility gains in tasks like text sentiment analysis and Census data releases. Overall, it provides a principled, adaptable method to interpret and calibrate DP protections against realistic privacy threats, enabling better utility at controlled risk.

Abstract

Differentially private (DP) mechanisms are difficult to interpret and calibrate because existing methods for mapping standard privacy parameters to concrete privacy risks -- re-identification, attribute inference, and data reconstruction -- are both overly pessimistic and inconsistent. In this work, we use the hypothesis-testing interpretation of DP ($f$-DP), and determine that bounds on attack success can take the same unified form across re-identification, attribute inference, and data reconstruction risks. Our unified bounds are (1) consistent across a multitude of attack settings, and (2) tunable, enabling practitioners to evaluate risk with respect to arbitrary, including worst-case, levels of baseline risk. Empirically, our results are tighter than prior methods using $\varepsilon$-DP, Rényi DP, and concentrated DP. As a result, calibrating noise using our bounds can reduce the required noise by 20% at the same risk level, which yields, e.g., an accuracy increase from 52% to 70% in a text classification task. Overall, this unifying perspective provides a principled framework for interpreting and calibrating the degree of protection in DP against specific levels of re-identification, attribute inference, or data reconstruction risk.

Unifying Re-Identification, Attribute Inference, and Data Reconstruction Risks in Differential Privacy

TL;DR

This work reframes differential privacy through the hypothesis-testing lens of -DP to unify three operational privacy risks: re-identification, attribute inference, and data reconstruction. It derives a single, tight bound that applies across these risks under a strong adversary model and allows risk assessment to be tuned to arbitrary baseline risk levels. The framework supports practical computation for complex DP mechanisms (e.g., DP-SGD) and demonstrates tangible utility gains in tasks like text sentiment analysis and Census data releases. Overall, it provides a principled, adaptable method to interpret and calibrate DP protections against realistic privacy threats, enabling better utility at controlled risk.

Abstract

Differentially private (DP) mechanisms are difficult to interpret and calibrate because existing methods for mapping standard privacy parameters to concrete privacy risks -- re-identification, attribute inference, and data reconstruction -- are both overly pessimistic and inconsistent. In this work, we use the hypothesis-testing interpretation of DP (-DP), and determine that bounds on attack success can take the same unified form across re-identification, attribute inference, and data reconstruction risks. Our unified bounds are (1) consistent across a multitude of attack settings, and (2) tunable, enabling practitioners to evaluate risk with respect to arbitrary, including worst-case, levels of baseline risk. Empirically, our results are tighter than prior methods using -DP, Rényi DP, and concentrated DP. As a result, calibrating noise using our bounds can reduce the required noise by 20% at the same risk level, which yields, e.g., an accuracy increase from 52% to 70% in a text classification task. Overall, this unifying perspective provides a principled framework for interpreting and calibrating the degree of protection in DP against specific levels of re-identification, attribute inference, or data reconstruction risk.

Paper Structure

This paper contains 67 sections, 28 theorems, 79 equations, 10 figures, 1 table.

Key Result

Lemma 2.1

An algorithm $M: 2^\mathbb{D} \to \Theta$ satisfies $f$-DP iff for any measurable $E \subseteq \Theta$ and $S \simeq S'$:

Figures (10)

  • Figure 1: Our results offer a unified and more precise way to interpret and calibrate DP mechanisms in terms of re-identification, attribute inference, and data reconstruction risks. Top: The success of all these attacks cannot be higher than the power of the worst-case membership inference attack, which we immediately obtain from the $f$ function in the decision-theoretic characterization of privacy mechanisms---$f$-DP. Bottom left:Our results enable to add $\approx 20\%$ less noise at any given level of risk compared to using prior methods (see \ref{['sec:exp']} for details). Bottom middle: Less noise translates into $\approx18$pp improved task accuracy (shown: DP-SGD for sentiment classification with GPT-2). Bottom right: The unifying risk measure is tunable---we can either estimate post-release risk for any given level of baseline risk, or measure the worst-case risk (shown: US 2020 Census state-level data).
  • Figure 2: Our bound on predicate singling out in the strong threat model (SPSO) is always non-vacuous, and, surprisingly, shows significantly lower risk than bounds in the PSO threat model. The risk is $\mathsf{adv} = \mathsf{succ} - \mathsf{base}$ with fixed given $\mathsf{base}$ for Gaussian mechanism with $\varepsilon$ calculated for $\delta = 10^{-5}$.
  • Figure 3: Our bound on reconstruction robustness shows lower risk than prior bounds. We show risk as $\mathsf{adv} = \mathsf{succ} - \mathsf{base}$ for three different baseline values for Gaussian mechanism with $\varepsilon$ calculated for $\delta = 10^{-5}$.
  • Figure 4: Our method shows up to 33% lower worst-case reconstruction risk in the US 2020 Census release than the prior method. x axis shows granularity levels of the release, y axis shows risk of attacks as $\mathsf{succ} - \mathsf{base}$ for the worst-case baseline.
  • Figure 5: We can significantly tighten the bounds in the setting of binary attribute inference (SAI, via $f$-DP), outperforming the prior bound based on Fano's inequality.
  • ...and 5 more figures

Theorems & Definitions (56)

  • Lemma 2.1
  • Definition 3.1: PSO security
  • Definition 3.2: SPSO security
  • Definition 3.3: SRR security
  • Definition 3.4: SAI security
  • Lemma 3.1
  • Theorem 3.1: Informal
  • Theorem 3.2: Informal
  • Theorem 3.3: Informal
  • Definition A.1: dwork2006calibrating
  • ...and 46 more