How Private are DP-SGD Implementations?

Lynn Chua; Badih Ghazi; Pritish Kamath; Ravi Kumar; Pasin Manurangsi; Amer Sinha; Chiyuan Zhang

How Private are DP-SGD Implementations?

Lynn Chua, Badih Ghazi, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Amer Sinha, Chiyuan Zhang

TL;DR

The work investigates how batch sampling choices in DP-SGD, framed through Adaptive Batch Linear Queries, influence privacy guarantees with δ_B(ε) curves for deterministic, Poisson, and shuffle batchers. By employing dominating pairs and hockey-stick divergence, it shows that shuffle batching provides always stronger privacy than deterministic batching, but Poisson subsampling can be either more or less private than deterministic batching depending on ε, and shuffle amplification can be substantially weaker than expected. The authors derive closed-form privacy bounds for deterministic and Poisson cases, discuss the absence of a proven tightly dominating pair for shuffle, and provide numerical evidence using PLD/RDP accounting to illustrate the potential misreporting risks when equating shuffle with Poisson analyses. The results highlight that batch sampler choice materially affects reported DP-SGD privacy, underscoring the need for careful, sampler-specific privacy accounting in practice and motivating further work on tight shuffle-based DP analysis and multi-epoch settings.

Abstract

We demonstrate a substantial gap between the privacy guarantees of the Adaptive Batch Linear Queries (ABLQ) mechanism under different types of batch sampling: (i) Shuffling, and (ii) Poisson subsampling; the typical analysis of Differentially Private Stochastic Gradient Descent (DP-SGD) follows by interpreting it as a post-processing of ABLQ. While shuffling-based DP-SGD is more commonly used in practical implementations, it has not been amenable to easy privacy analysis, either analytically or even numerically. On the other hand, Poisson subsampling-based DP-SGD is challenging to scalably implement, but has a well-understood privacy analysis, with multiple open-source numerically tight privacy accountants available. This has led to a common practice of using shuffling-based DP-SGD in practice, but using the privacy analysis for the corresponding Poisson subsampling version. Our result shows that there can be a substantial gap between the privacy analysis when using the two types of batch sampling, and thus advises caution in reporting privacy parameters for DP-SGD.

How Private are DP-SGD Implementations?

TL;DR

Abstract

Paper Structure (24 sections, 8 theorems, 17 equations, 4 figures, 4 algorithms)

This paper contains 24 sections, 8 theorems, 17 equations, 4 figures, 4 algorithms.

Introduction
Adaptive Batch Linear Queries and Batch Samplers.
Our Contributions
Technical Overview
Related work.
Differential Privacy
Adaptive Batch Linear Queries Mechanism
Adjacency Notions
Hockey Stick Divergence
Dominating Pairs for ABLQ_B
Tightly dominating pair for $\textsc{ABLQ}_{\mathcal{D}}$.
Tightly dominating pair of $\textsc{ABLQ}_{\mathcal{P}}$.
Tightly dominating pair for $\textsc{ABLQ}_{\mathcal{S}}$?
Privacy Loss Comparisons
ABLQ_D vs ABLQ_S
...and 9 more sections

Key Result

Proposition 3.1

For all $\varepsilon \ge 0$, it holds that where $\Phi(\cdot)$ is the cumulative density function (CDF) of the standard normal random variable $\mathcal{N}(0, 1)$.

Figures (4)

Figure 1: Privacy parameter $\varepsilon$ for different noise parameters $\sigma$, for fixed $\delta = 10^{-6}$ and number of steps $T = 10,000$. $\varepsilon_{\mathcal{D}}$ : for deterministic batching, $\varepsilon_{\mathcal{P}}$ : upper bounds when using Poisson subsampling (computed using different accountants), and $\varepsilon_{\mathcal{S}}$ : a lower bound when using shuffling. We observe that shuffling does not provide much amplification for small values of $\sigma$, incurring significantly higher privacy cost compared to Poisson subsampling.
Figure 2: $\delta_{\mathcal{D}}(\varepsilon)$ and $\delta_{\mathcal{P}}(\varepsilon)$ for $\sigma=0.3$ and $T = 10$.
Figure 3: $\delta_{\mathcal{D}}(\varepsilon)$, upper bounds on $\delta_{\mathcal{P}}(\varepsilon)$ and a lower bound on $\delta_{\mathcal{S}}(\varepsilon)$ for varying $\varepsilon$ and fixed $\sigma = 0.4$ and $T = 10,000$.
Figure 5: $\varepsilon_{\mathcal{D}}(\delta)$, upper bounds on $\varepsilon_{\mathcal{P}}(\delta)$ and a lower bound on $\varepsilon_{\mathcal{S}}(\delta)$ for varying $\sigma$ and fixed $\delta = 10^{-5}$ and $T = 1000$.

Theorems & Definitions (19)

Definition 2.1: DP
Definition 2.2
Definition 2.3: Dominating Pair zhu22optimal
Proposition 3.1: Theorem 8 in balle18improving
Conjecture 3.2
Theorem 4.1
Theorem 4.2
Conjecture 4.3
Lemma 2.1: Joint Convexity of Hockey Stick Divergence
proof
...and 9 more

How Private are DP-SGD Implementations?

TL;DR

Abstract

How Private are DP-SGD Implementations?

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (19)