Table of Contents
Fetching ...

Differentially Private Stochastic Gradient Descent with Fixed-Size Minibatches: Tighter RDP Guarantees with or without Replacement

Jeremiah Birrell, Reza Ebrahimi, Rouzbeh Behnia, Jason Pacheco

TL;DR

This work advances privacy accounting for DP-SGD by introducing holistic fixed-size subsampling analysis for both without and with replacement (FSwoR and FSwR) across add/remove and replace-one adjacency. The authors derive non-asymptotic Rényi-DP bounds with precise Taylor-expansion remainder controls, achieving up to roughly a 4x improvement over prior fixed-size bounds and showing leading-order privacy equivalence with Poisson subsampling under replace-one adjacency. They also provide nontrivial lower bounds, practical implications for gradient variance, and memory benefits, backed by CIFAR-10 experiments that demonstrate tighter privacy guarantees and competitive accuracy. The results support preferring fixed-size subsampling in DP-SGD pipelines for memory efficiency and robust privacy guarantees, with code publicly available for integration into existing DP libraries.

Abstract

Differentially private stochastic gradient descent (DP-SGD) has been instrumental in privately training deep learning models by providing a framework to control and track the privacy loss incurred during training. At the core of this computation lies a subsampling method that uses a privacy amplification lemma to enhance the privacy guarantees provided by the additive noise. Fixed size subsampling is appealing for its constant memory usage, unlike the variable sized minibatches in Poisson subsampling. It is also of interest in addressing class imbalance and federated learning. However, the current computable guarantees for fixed-size subsampling are not tight and do not consider both add/remove and replace-one adjacency relationships. We present a new and holistic R{é}nyi differential privacy (RDP) accountant for DP-SGD with fixed-size subsampling without replacement (FSwoR) and with replacement (FSwR). For FSwoR we consider both add/remove and replace-one adjacency. Our FSwoR results improves on the best current computable bound by a factor of $4$. We also show for the first time that the widely-used Poisson subsampling and FSwoR with replace-one adjacency have the same privacy to leading order in the sampling probability. Accordingly, our work suggests that FSwoR is often preferable to Poisson subsampling due to constant memory usage. Our FSwR accountant includes explicit non-asymptotic upper and lower bounds and, to the authors' knowledge, is the first such analysis of fixed-size RDP with replacement for DP-SGD. We analytically and empirically compare fixed size and Poisson subsampling, and show that DP-SGD gradients in a fixed-size subsampling regime exhibit lower variance in practice in addition to memory usage benefits.

Differentially Private Stochastic Gradient Descent with Fixed-Size Minibatches: Tighter RDP Guarantees with or without Replacement

TL;DR

This work advances privacy accounting for DP-SGD by introducing holistic fixed-size subsampling analysis for both without and with replacement (FSwoR and FSwR) across add/remove and replace-one adjacency. The authors derive non-asymptotic Rényi-DP bounds with precise Taylor-expansion remainder controls, achieving up to roughly a 4x improvement over prior fixed-size bounds and showing leading-order privacy equivalence with Poisson subsampling under replace-one adjacency. They also provide nontrivial lower bounds, practical implications for gradient variance, and memory benefits, backed by CIFAR-10 experiments that demonstrate tighter privacy guarantees and competitive accuracy. The results support preferring fixed-size subsampling in DP-SGD pipelines for memory efficiency and robust privacy guarantees, with code publicly available for integration into existing DP libraries.

Abstract

Differentially private stochastic gradient descent (DP-SGD) has been instrumental in privately training deep learning models by providing a framework to control and track the privacy loss incurred during training. At the core of this computation lies a subsampling method that uses a privacy amplification lemma to enhance the privacy guarantees provided by the additive noise. Fixed size subsampling is appealing for its constant memory usage, unlike the variable sized minibatches in Poisson subsampling. It is also of interest in addressing class imbalance and federated learning. However, the current computable guarantees for fixed-size subsampling are not tight and do not consider both add/remove and replace-one adjacency relationships. We present a new and holistic R{é}nyi differential privacy (RDP) accountant for DP-SGD with fixed-size subsampling without replacement (FSwoR) and with replacement (FSwR). For FSwoR we consider both add/remove and replace-one adjacency. Our FSwoR results improves on the best current computable bound by a factor of . We also show for the first time that the widely-used Poisson subsampling and FSwoR with replace-one adjacency have the same privacy to leading order in the sampling probability. Accordingly, our work suggests that FSwoR is often preferable to Poisson subsampling due to constant memory usage. Our FSwR accountant includes explicit non-asymptotic upper and lower bounds and, to the authors' knowledge, is the first such analysis of fixed-size RDP with replacement for DP-SGD. We analytically and empirically compare fixed size and Poisson subsampling, and show that DP-SGD gradients in a fixed-size subsampling regime exhibit lower variance in practice in addition to memory usage benefits.
Paper Structure (31 sections, 14 theorems, 149 equations, 10 figures)

This paper contains 31 sections, 14 theorems, 149 equations, 10 figures.

Key Result

Theorem 3.1

Let $D\simeq_{a/r} D^\prime$ be adjacent datasets. With transition probabilities defined as in eq:DP_SGD_p_def and letting $q=|B|/|D|$ we have

Figures (10)

  • Figure 1: FS${}_{\text{wR}}$-RDP lower bounds from Theorem \ref{['thm:FSR_RDP_LB']} as a function of $\alpha$, with $\sigma=6$ and $q=0.001$.
  • Figure 2: Comparison of FS${}_{\text{woR}}$-RDP bounds under replace-one adjacency from Theorem \ref{['thm:FS_woR_replace_one']} for various choices of $m$ with the upper and lower bounds from wang2019subsampled. We used $\sigma_t=6$, $|B|=120$, and $|D|=50,000$.
  • Figure 3: FS${}_{\text{woR}}$$(\epsilon,\delta)$-DP guarantees under replace-one adjacency; comparison of bounds obtained using Theorem \ref{['thm:FS_woR_replace_one']} for various choices of $m$ with those obtained from the upper and lower bounds from wang2019subsampled. Following abadi2016deep, we used $\sigma_t=6$, $|B|=120$, $|D|=50,000$, and 250 training epochs.
  • Figure 4: Comparing privacy guarantees of FS${}_{\text{woR}}$-RDP with Wang et al. and Poisson Subsampled RDP (top). Comparing FS${}_{\text{woR}}$-RDP performance against Poisson subsampled RDP (bottom). We used $\sigma_t=6$, $C=3$, $|B|=120$, $|D|=50,000$, and $lr = 1e-3$.
  • Figure 5: Comparing memory usage of FS-RDP with other Opacus privacy accountants in each training epoch. We used $|B|=120$, and $|D|=50,000$. Unlike other methods, FS-RDP's memory usage remains constant throughout training.
  • ...and 5 more figures

Theorems & Definitions (26)

  • Definition 2.1: $(\alpha,\epsilon)$-RDP mironov2019r
  • Theorem 3.1
  • Theorem 3.2: Taylor Expansion Upper Bound
  • Theorem 3.3: $T$-step FS${}_{\text{woR}}$-RDP Upper Bound: Add/Remove Adjacency
  • Theorem 3.4: FS${}_{\text{woR}}$-RDP Upper Bounds for Replace-one Adjacency
  • Theorem 3.5: $T$-step FS${}_{\text{woR}}$-RDP Upper Bound: Replace-one Adjacency
  • Theorem 3.6: Fixed-size RDP with Replacement Upper Bound
  • Theorem 3.7: Fixed-size RDP with Replacement Lower Bound
  • Lemma A.1
  • proof
  • ...and 16 more