Gaussian DP for Reporting Differential Privacy Guarantees in Machine Learning
Juan Felipe Gomez, Bogdan Kulynych, Georgios Kaissis, Flavio P. Calmon, Jamie Hayes, Borja Balle, Antti Honkela
TL;DR
This paper argues that reporting DP guarantees in ML should move beyond single $(\varepsilon,\delta)$ budgets and adopt non-asymptotic Gaussian Differential Privacy ($\mu$-GDP) as a concise, near-complete representation of the full privacy profile. It leverages open-source numerical accountants to compute the exact privacy trade-off curve and then derives the tight $\mu^*$ GDP bound, with optional regret testing to verify fit; when GDP is a poor fit, it recommends reporting the full privacy profile or $\rho$-zCDP as fallbacks. A central contribution is the introduction of a practical reporting framework that yields a single, comparable privacy parameter while preserving the ability to bound membership-inference attack risk; this framework is demonstrated on DP-SGD and the TopDown census algorithm, among others. The work provides theoretical and empirical support that GDP often matches the full DP profile in realistic ML settings, enables clearer communication to regulators and researchers, and is supported by a Python package for computation. Overall, the approach offers a principled path toward more informative and comparable privacy reporting in modern DP-enabled ML systems.
Abstract
Current practices for reporting the level of differential privacy (DP) protection for machine learning (ML) algorithms such as DP-SGD provide an incomplete and potentially misleading picture of the privacy guarantees. For instance, if only a single $(\varepsilon,δ)$ is known about a mechanism, standard analyses show that there exist highly accurate inference attacks against training data records, when, in fact, such accurate attacks might not exist. In this position paper, we argue that using non-asymptotic Gaussian Differential Privacy (GDP) as the primary means of communicating DP guarantees in ML avoids these potential downsides. Using two recent developments in the DP literature: (i) open-source numerical accountants capable of computing the privacy profile and $f$-DP curves of DP-SGD to arbitrary accuracy, and (ii) a decision-theoretic metric over DP representations, we show how to provide non-asymptotic bounds on GDP using numerical accountants, and show that GDP can capture the entire privacy profile of DP-SGD and related algorithms with virtually no error, as quantified by the metric. To support our claims, we investigate the privacy profiles of state-of-the-art DP large-scale image classification, and the TopDown algorithm for the U.S. Decennial Census, observing that GDP fits their profiles remarkably well in all cases. We conclude with a discussion on the strengths and weaknesses of this approach, and discuss which other privacy mechanisms could benefit from GDP.
