Table of Contents
Fetching ...

Gaussian DP for Reporting Differential Privacy Guarantees in Machine Learning

Juan Felipe Gomez, Bogdan Kulynych, Georgios Kaissis, Flavio P. Calmon, Jamie Hayes, Borja Balle, Antti Honkela

TL;DR

This paper argues that reporting DP guarantees in ML should move beyond single $(\varepsilon,\delta)$ budgets and adopt non-asymptotic Gaussian Differential Privacy ($\mu$-GDP) as a concise, near-complete representation of the full privacy profile. It leverages open-source numerical accountants to compute the exact privacy trade-off curve and then derives the tight $\mu^*$ GDP bound, with optional regret testing to verify fit; when GDP is a poor fit, it recommends reporting the full privacy profile or $\rho$-zCDP as fallbacks. A central contribution is the introduction of a practical reporting framework that yields a single, comparable privacy parameter while preserving the ability to bound membership-inference attack risk; this framework is demonstrated on DP-SGD and the TopDown census algorithm, among others. The work provides theoretical and empirical support that GDP often matches the full DP profile in realistic ML settings, enables clearer communication to regulators and researchers, and is supported by a Python package for computation. Overall, the approach offers a principled path toward more informative and comparable privacy reporting in modern DP-enabled ML systems.

Abstract

Current practices for reporting the level of differential privacy (DP) protection for machine learning (ML) algorithms such as DP-SGD provide an incomplete and potentially misleading picture of the privacy guarantees. For instance, if only a single $(\varepsilon,δ)$ is known about a mechanism, standard analyses show that there exist highly accurate inference attacks against training data records, when, in fact, such accurate attacks might not exist. In this position paper, we argue that using non-asymptotic Gaussian Differential Privacy (GDP) as the primary means of communicating DP guarantees in ML avoids these potential downsides. Using two recent developments in the DP literature: (i) open-source numerical accountants capable of computing the privacy profile and $f$-DP curves of DP-SGD to arbitrary accuracy, and (ii) a decision-theoretic metric over DP representations, we show how to provide non-asymptotic bounds on GDP using numerical accountants, and show that GDP can capture the entire privacy profile of DP-SGD and related algorithms with virtually no error, as quantified by the metric. To support our claims, we investigate the privacy profiles of state-of-the-art DP large-scale image classification, and the TopDown algorithm for the U.S. Decennial Census, observing that GDP fits their profiles remarkably well in all cases. We conclude with a discussion on the strengths and weaknesses of this approach, and discuss which other privacy mechanisms could benefit from GDP.

Gaussian DP for Reporting Differential Privacy Guarantees in Machine Learning

TL;DR

This paper argues that reporting DP guarantees in ML should move beyond single budgets and adopt non-asymptotic Gaussian Differential Privacy (-GDP) as a concise, near-complete representation of the full privacy profile. It leverages open-source numerical accountants to compute the exact privacy trade-off curve and then derives the tight GDP bound, with optional regret testing to verify fit; when GDP is a poor fit, it recommends reporting the full privacy profile or -zCDP as fallbacks. A central contribution is the introduction of a practical reporting framework that yields a single, comparable privacy parameter while preserving the ability to bound membership-inference attack risk; this framework is demonstrated on DP-SGD and the TopDown census algorithm, among others. The work provides theoretical and empirical support that GDP often matches the full DP profile in realistic ML settings, enables clearer communication to regulators and researchers, and is supported by a Python package for computation. Overall, the approach offers a principled path toward more informative and comparable privacy reporting in modern DP-enabled ML systems.

Abstract

Current practices for reporting the level of differential privacy (DP) protection for machine learning (ML) algorithms such as DP-SGD provide an incomplete and potentially misleading picture of the privacy guarantees. For instance, if only a single is known about a mechanism, standard analyses show that there exist highly accurate inference attacks against training data records, when, in fact, such accurate attacks might not exist. In this position paper, we argue that using non-asymptotic Gaussian Differential Privacy (GDP) as the primary means of communicating DP guarantees in ML avoids these potential downsides. Using two recent developments in the DP literature: (i) open-source numerical accountants capable of computing the privacy profile and -DP curves of DP-SGD to arbitrary accuracy, and (ii) a decision-theoretic metric over DP representations, we show how to provide non-asymptotic bounds on GDP using numerical accountants, and show that GDP can capture the entire privacy profile of DP-SGD and related algorithms with virtually no error, as quantified by the metric. To support our claims, we investigate the privacy profiles of state-of-the-art DP large-scale image classification, and the TopDown algorithm for the U.S. Decennial Census, observing that GDP fits their profiles remarkably well in all cases. We conclude with a discussion on the strengths and weaknesses of this approach, and discuss which other privacy mechanisms could benefit from GDP.

Paper Structure

This paper contains 46 sections, 14 theorems, 42 equations, 8 figures, 5 tables, 2 algorithms.

Key Result

Theorem 2.4

A mechanism $M$ satisfies $(\varepsilon, \delta(\varepsilon))$-DP iff it is $f$-DP with:

Figures (8)

  • Figure 1: Left: Comparison between the Laplace trade-off curve ($b=1$) and the DP trade-off curve with $\varepsilon = 1$. Higher means more private, hence the pure-DP guarantee is a valid and visually tight bound for Laplace mechanism. Middle: Comparison between a DP-SGD trade-off curve ($\sigma = 9.4, T = 2000.0, q = 0.33$) from de2022unlockinghighaccuracydifferentiallyprivate and a GDP guarantee. This shows that the GDP bound is tighter for DP-SGD than the $\varepsilon$-DP bound is for Laplace. Right: we quantify the regret from using the DP parameterization over the exact trade-off curve (a measure of "goodness-of-fit"). Lower means more accurate. We fix $\delta = 10^{-5}$. Although GDP is not universally the best representation (it is not the most accurate for Laplace), GDP is the most accurate concise representation for DP-SGD. We provide technical details in Appendix \ref{['app:tech_details']}.
  • Figure 2: Illustration of the kaissis2024beyond regret metric between two mechanisms which satisfy $f$-DP and $\tilde{f}$-DP, respectively. The metric $\Delta(f, \tilde{f})$ is the smallest $\kappa \geq 0$ such that $f(\alpha + \kappa) - \kappa$ dominates $\tilde{f}$. We use it to quantify the regret of representing the true $f$-DP curve of a mechanism obtained using numeric accounting with curves $\tilde{f}$ associated with various guarantees used to quantify privacy: ADP, zCDP, and GDP.
  • Figure 3: Reporting pessimistic, non-asymptotic $\mu$-GDP
  • Figure 4: Worst-case regret values as a function of the sampling rate in DP-SGD for various choices of noise parameter $\sigma$ and compositions. We sweep over $T = \{400, 1000,2000\}$ compositions, with darker lines indicating higher composition numbers.
  • Figure 5: Numerically evaluated trade-off curves and the best conservative $\mu$-GDP bounds for (a) The TopDown algorithm; and (b) Randomized Response.
  • ...and 3 more figures

Theorems & Definitions (32)

  • Definition 2.1: dwork2006calibratingdwork2014algorithmic
  • Definition 2.2: mironov2017renyibun2016concentrated
  • Definition 2.3: dong2022gaussian
  • Theorem 2.4: dong2022gaussian
  • Definition 2.5: dong2022gaussian
  • Definition 2.6: kaissis2024beyond
  • Proposition 4.0
  • Proposition 6.0
  • Proposition 6.0
  • Definition A.1: See, e.g., asoodehconversion
  • ...and 22 more