Closed-Form Bounds for DP-SGD against Record-level Inference
Giovanni Cherubin, Boris Köpf, Andrew Paverd, Shruti Tople, Lukas Wutschitz, Santiago Zanella-Béguelin
TL;DR
This work introduces a threat-specific privacy analysis for DP-SGD by framing it as an information-theoretic channel and deriving closed-form Bayes-security bounds against membership and attribute inference. By approximating intermediate gradients as Gaussian and reducing to two worst-case challenge points, the authors obtain a tractable bound: $\beta^*(P_{O|S}) \ge 1 - \mathrm{erf}\left( p\,\Delta_f /(2\sqrt{2}\,\sigma\,C)\right) - O\left(\sqrt{pT}/\sigma\right)$, with $\Delta_f$ capturing gradient sensitivity across secrets. They specialize this bound to MIA (yielding $\beta^* \ge 1 - \mathrm{erf}( p\sqrt{T}/(\sqrt{2}\,\sigma) ) - O(\sqrt{pT}/\sigma)$) and AI (data-dependent bounds using the gradient-sensitivity vector $R$), compare against DP-accountants, and demonstrate efficiency, tightness, and utility implications on Adult and Purchase datasets. The results show AI is substantially more secure than MIA in many settings and enable interactive parameter tuning, while revealing potential privacy-utility gains when weaker threats are acceptable. The framework also discusses inference-time attackers and future directions for improving bounds with influence-function-based and taint-analysis approaches.
Abstract
Machine learning models trained with differentially-private (DP) algorithms such as DP-SGD enjoy resilience against a wide range of privacy attacks. Although it is possible to derive bounds for some attacks based solely on an $(\varepsilon,δ)$-DP guarantee, meaningful bounds require a small enough privacy budget (i.e., injecting a large amount of noise), which results in a large loss in utility. This paper presents a new approach to evaluate the privacy of machine learning models against specific record-level threats, such as membership and attribute inference, without the indirection through DP. We focus on the popular DP-SGD algorithm, and derive simple closed-form bounds. Our proofs model DP-SGD as an information theoretic channel whose inputs are the secrets that an attacker wants to infer (e.g., membership of a data record) and whose outputs are the intermediate model parameters produced by iterative optimization. We obtain bounds for membership inference that match state-of-the-art techniques, whilst being orders of magnitude faster to compute. Additionally, we present a novel data-dependent bound against attribute inference. Our results provide a direct, interpretable, and practical way to evaluate the privacy of trained models against specific inference threats without sacrificing utility.
