Unifying Re-Identification, Attribute Inference, and Data Reconstruction Risks in Differential Privacy
Bogdan Kulynych, Juan Felipe Gomez, Georgios Kaissis, Jamie Hayes, Borja Balle, Flavio P. Calmon, Jean Louis Raisaro
TL;DR
This work reframes differential privacy through the hypothesis-testing lens of $f$-DP to unify three operational privacy risks: re-identification, attribute inference, and data reconstruction. It derives a single, tight bound that applies across these risks under a strong adversary model and allows risk assessment to be tuned to arbitrary baseline risk levels. The framework supports practical computation for complex DP mechanisms (e.g., DP-SGD) and demonstrates tangible utility gains in tasks like text sentiment analysis and Census data releases. Overall, it provides a principled, adaptable method to interpret and calibrate DP protections against realistic privacy threats, enabling better utility at controlled risk.
Abstract
Differentially private (DP) mechanisms are difficult to interpret and calibrate because existing methods for mapping standard privacy parameters to concrete privacy risks -- re-identification, attribute inference, and data reconstruction -- are both overly pessimistic and inconsistent. In this work, we use the hypothesis-testing interpretation of DP ($f$-DP), and determine that bounds on attack success can take the same unified form across re-identification, attribute inference, and data reconstruction risks. Our unified bounds are (1) consistent across a multitude of attack settings, and (2) tunable, enabling practitioners to evaluate risk with respect to arbitrary, including worst-case, levels of baseline risk. Empirically, our results are tighter than prior methods using $\varepsilon$-DP, Rényi DP, and concentrated DP. As a result, calibrating noise using our bounds can reduce the required noise by 20% at the same risk level, which yields, e.g., an accuracy increase from 52% to 70% in a text classification task. Overall, this unifying perspective provides a principled framework for interpreting and calibrating the degree of protection in DP against specific levels of re-identification, attribute inference, or data reconstruction risk.
