Table of Contents
Fetching ...

Balls-and-Bins Sampling for DP-SGD

Lynn Chua, Badih Ghazi, Charlie Harrison, Ethan Leeman, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Amer Sinha, Chiyuan Zhang

TL;DR

The paper proposes Balls-and-Bins sampling as a DP-SGD batch generator that mirrors Shuffle in implementation while preserving Poisson-like batch marginals, enabling favorable privacy amplification without sacrificing utility. By identifying a tightly dominating pair $(P_{\mathcal{B}}, Q_{\mathcal{B}})$ for the ABLQ$\_ {\mathcal{B}}$ mechanism, it provides a rigorous DP characterization and demonstrates improved privacy guarantees over deterministic and shuffle batching, with practical parity to Poisson subsampling in many regimes. To make privacy accounting tractable, the authors develop importance-sampling and order-statistics sampling techniques for Monte Carlo estimation of $\delta_{\mathcal{B}}(\varepsilon)$, including lower bounds, and validate these methods on large-scale datasets where Balls-and-Bins attains competitive utility. The work lays out both practical benefits and several open questions, such as tight DP accounting for $\mathcal{ABLQ}_{\mathcal{B}}$ and extensions to multi-epoch training, while aligning with concurrent results that link Balls-and-Bins to non-asymptotic privacy guarantees. Overall, the approach offers a compelling path to robust DP-SGD with Shuffle-like practicality and strong privacy amplification.

Abstract

We introduce the Balls-and-Bins sampling for differentially private (DP) optimization methods such as DP-SGD. While it has been common practice to use some form of shuffling in DP-SGD implementations, privacy accounting algorithms have typically assumed that Poisson subsampling is used instead. Recent work by Chua et al. (ICML 2024), however, pointed out that shuffling based DP-SGD can have a much larger privacy cost in practical regimes of parameters. In this work we show that the Balls-and-Bins sampling achieves the "best-of-both" samplers, namely, the implementation of Balls-and-Bins sampling is similar to that of Shuffling and models trained using DP-SGD with Balls-and-Bins sampling achieve utility comparable to those trained using DP-SGD with Shuffling at the same noise multiplier, and yet, Balls-and-Bins sampling enjoys similar-or-better privacy amplification as compared to Poisson subsampling in practical regimes.

Balls-and-Bins Sampling for DP-SGD

TL;DR

The paper proposes Balls-and-Bins sampling as a DP-SGD batch generator that mirrors Shuffle in implementation while preserving Poisson-like batch marginals, enabling favorable privacy amplification without sacrificing utility. By identifying a tightly dominating pair for the ABLQ mechanism, it provides a rigorous DP characterization and demonstrates improved privacy guarantees over deterministic and shuffle batching, with practical parity to Poisson subsampling in many regimes. To make privacy accounting tractable, the authors develop importance-sampling and order-statistics sampling techniques for Monte Carlo estimation of , including lower bounds, and validate these methods on large-scale datasets where Balls-and-Bins attains competitive utility. The work lays out both practical benefits and several open questions, such as tight DP accounting for and extensions to multi-epoch training, while aligning with concurrent results that link Balls-and-Bins to non-asymptotic privacy guarantees. Overall, the approach offers a compelling path to robust DP-SGD with Shuffle-like practicality and strong privacy amplification.

Abstract

We introduce the Balls-and-Bins sampling for differentially private (DP) optimization methods such as DP-SGD. While it has been common practice to use some form of shuffling in DP-SGD implementations, privacy accounting algorithms have typically assumed that Poisson subsampling is used instead. Recent work by Chua et al. (ICML 2024), however, pointed out that shuffling based DP-SGD can have a much larger privacy cost in practical regimes of parameters. In this work we show that the Balls-and-Bins sampling achieves the "best-of-both" samplers, namely, the implementation of Balls-and-Bins sampling is similar to that of Shuffling and models trained using DP-SGD with Balls-and-Bins sampling achieve utility comparable to those trained using DP-SGD with Shuffling at the same noise multiplier, and yet, Balls-and-Bins sampling enjoys similar-or-better privacy amplification as compared to Poisson subsampling in practical regimes.

Paper Structure

This paper contains 21 sections, 14 theorems, 18 equations, 5 figures, 11 algorithms.

Key Result

Lemma 2.3

For distributions $P, Q$ over $\Omega$, and distributions $A, B$ over $\Gamma$, if there exists $f : \Omega \to \Gamma$ such that simultaneously $f(P) = A$ and $f(Q) = B$ then $(P, Q) \succcurlyeq (A, B)$.More strongly, the converse is also true (lem:converse-post-process-and-domination).

Figures (5)

  • Figure 1: AUC values for 1 epoch of training with the Criteo Display Ads pCTR dataset (top) and the Criteo Sponsored Search Conversion Log dataset (bottom). On the left, we train without privacy and vary the batch size. In the middle and right, we train privately with varying $\sigma$, using (expected) batch sizes 1024 (middle) and 8192 (right). We use a log scale to the left of the vertical dotted line at $\sigma = 0.1$, and a linear scale to the right.
  • Figure 2: Bounds on $\delta_{\mathcal{P}}$, $\delta_{\mathcal{S}}$, and $\delta_{\mathcal{B}}$ are plotted for various values of $\varepsilon$ for different (expected) batch size and $\sigma$. These mean and upper confidence bounds for $\delta_{\mathcal{B}}$ were obtained using order statistics sampling (specific orders and sample complexity specified in \ref{['app:training']}).
  • Figure 3: Upper confidence bounds on $\delta_{\mathcal{B}}(\varepsilon)$ against various values of $\varepsilon$ for two settings of $T$ and $\sigma$, with and without importance sampling. Additionally, lower bounds on $\delta_{\mathcal{B}}(\varepsilon)$ are included.
  • Figure 4: Upper confidence bounds on $\delta_{\mathcal{B}}(\varepsilon)$ against various values of $\varepsilon$ for two settings of $T$ and $\sigma$, with and without order statistics sampling for roughly the same running time complexity. Since order statistics sampling offers a significant speed up, it affords a larger sample complexity. Additionally, lower bounds on $\delta_{\mathcal{B}}(\varepsilon)$ are included.
  • Figure : $\mathcal{D}_{b,T}$: Deterministic Batch Generator

Theorems & Definitions (24)

  • Definition 2.1: DP
  • Definition 2.2: Dominating Pair zhu22optimal
  • Lemma 2.3
  • Proposition 2.4: balle18improving
  • Proposition 2.5: koskela2020computingzhu22optimal
  • Proposition 2.6: chua24private
  • Theorem 3.1
  • Proposition 3.2: Joint Convexity of Hockey Stick Divergence; see, e.g., Lemma B.1 in chua24private
  • proof : Proof of \ref{['thm:bnb-dominating-pair']}
  • Proposition 3.3
  • ...and 14 more