Optimal Rates for $O(1)$-Smooth DP-SCO with a Single Epoch and Large Batches

Christopher A. Choquette-Choo; Arun Ganesh; Abhradeep Thakurta

Optimal Rates for $O(1)$-Smooth DP-SCO with a Single Epoch and Large Batches

Christopher A. Choquette-Choo, Arun Ganesh, Abhradeep Thakurta

TL;DR

This work resolves a long-standing trade-off in DP stochastic convex optimization by designing Accelerated-DP-SRGD, a single-epoch algorithm that achieves near-optimal DP-SCO rates with sublinear batch-gradient complexity. By marrying stochastic recursive gradients, Nesterov-style acceleration, and DP continual counting, the method attains $T= ilde{O}(n^{1/4})$ batch steps (with $B=n/T$) when the unconstrained minimizer lies in the constraint set, and $T= ilde{O}( oot n)$ with $B= oot n$ in the general case, all while preserving $( ext{ε}, ext{δ})$-DP and requiring only $2$ gradient evaluations per data point. The analysis jointly bounds utility and privacy via a potential-based approach, derives tight sensitivity bounds for gradient differences, and leverages the binary-tree mechanism to control cumulative DP noise. The results improve prior DP-SCO guarantees in the important single-epoch setting, with practical implications for privacy-preserving federated and streaming learning. The paper also discusses unaccelerated variants, removing common convexity assumptions, and extensions to non-convex losses via clipping, highlighting the method’s broad applicability.

Abstract

In this paper we revisit the DP stochastic convex optimization (SCO) problem. For convex smooth losses, it is well-known that the canonical DP-SGD (stochastic gradient descent) achieves the optimal rate of $O\left(\frac{LR}{\sqrt{n}} + \frac{LR \sqrt{p \log(1/δ)}}{εn}\right)$ under $(ε, δ)$-DP, and also well-known that variants of DP-SGD can achieve the optimal rate in a single epoch. However, the batch gradient complexity (i.e., number of adaptive optimization steps), which is important in applications like federated learning, is less well-understood. In particular, all prior work on DP-SCO requires $Ω(n)$ batch gradient steps, multiple epochs, or convexity for privacy. We propose an algorithm, Accelerated-DP-SRGD (stochastic recursive gradient descent), which bypasses the limitations of past work: it achieves the optimal rate for DP-SCO (up to polylog factors), in a single epoch using $\sqrt{n}$ batch gradient steps with batch size $\sqrt{n}$, and can be made private for arbitrary (non-convex) losses via clipping. If the global minimizer is in the constraint set, we can further improve this to $n^{1/4}$ batch gradient steps with batch size $n^{3/4}$. To achieve this, our algorithm combines three key ingredients, a variant of stochastic recursive gradients (SRG), accelerated gradient descent, and correlated noise generation from DP continual counting.

Optimal Rates for $O(1)$-Smooth DP-SCO with a Single Epoch and Large Batches

TL;DR

batch steps (with

) when the unconstrained minimizer lies in the constraint set, and

with

in the general case, all while preserving

-DP and requiring only

gradient evaluations per data point. The analysis jointly bounds utility and privacy via a potential-based approach, derives tight sensitivity bounds for gradient differences, and leverages the binary-tree mechanism to control cumulative DP noise. The results improve prior DP-SCO guarantees in the important single-epoch setting, with practical implications for privacy-preserving federated and streaming learning. The paper also discusses unaccelerated variants, removing common convexity assumptions, and extensions to non-convex losses via clipping, highlighting the method’s broad applicability.

Abstract

under

-DP, and also well-known that variants of DP-SGD can achieve the optimal rate in a single epoch. However, the batch gradient complexity (i.e., number of adaptive optimization steps), which is important in applications like federated learning, is less well-understood. In particular, all prior work on DP-SCO requires

batch gradient steps, multiple epochs, or convexity for privacy. We propose an algorithm, Accelerated-DP-SRGD (stochastic recursive gradient descent), which bypasses the limitations of past work: it achieves the optimal rate for DP-SCO (up to polylog factors), in a single epoch using

batch gradient steps with batch size

, and can be made private for arbitrary (non-convex) losses via clipping. If the global minimizer is in the constraint set, we can further improve this to

batch gradient steps with batch size

. To achieve this, our algorithm combines three key ingredients, a variant of stochastic recursive gradients (SRG), accelerated gradient descent, and correlated noise generation from DP continual counting.

Paper Structure (17 sections, 15 theorems, 76 equations, 1 table, 1 algorithm)

This paper contains 17 sections, 15 theorems, 76 equations, 1 table, 1 algorithm.

Introduction
Our Contribution
Background on (Private) Learning
DP-SGD : DP stochastic gradient descent
DP-FTRL : DP follow the regularized leader
Our Algorithm
Overview of Analysis of \ref{['alg:nSGD']}
Utility Analysis
Privacy analysis and DP-SCO bounds
Unaccelerated Variant
Utility and Privacy Analysis
Utility Analysis
Privacy analysis
Bounding gradient norms
Bounding sensitivity
...and 2 more sections

Key Result

Theorem 4.1

Fix some setting of the $\mathbf{b}_t$ such that $\left\|\mathbf{b}_t\right\|_2 \leq b_{max}$ for all $t$. For an appropriate setting of $\eta_t, \tau_t$ and $\beta \geq M$, suppose $\mathbb{E}[\left\|\Delta_t - \mathbb{E}[\Delta_t]\right\|_2^2] \leq S^2$. Then, with high probability over the random

Theorems & Definitions (27)

Theorem 4.1: Excess population risk; Simplification of Theorem \ref{['thm:excesspop']}
Theorem 4.2: Simplified version of Theorems \ref{['thm:mainthm-1']} and \ref{['thm:mainthm-2']}
Theorem 5.1: Excess population risk
proof
Lemma 5.2: Potential drop
proof : Proof of Lemma \ref{['lem:potDrop']}
Lemma 5.3
proof : Proof of Lemma \ref{['lem:struct1']}
Lemma 5.4
proof
...and 17 more

Optimal Rates for $O(1)$-Smooth DP-SCO with a Single Epoch and Large Batches

TL;DR

Abstract

Optimal Rates for $O(1)$-Smooth DP-SCO with a Single Epoch and Large Batches

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (27)