On the Gradient Complexity of Private Optimization with Private Oracles
Michael Menart, Aleksandar Nikolov
TL;DR
The paper establishes minimax lower bounds on the first-order oracle complexity for private optimization of Lipschitz convex losses under differential privacy, using a private proxy oracle framework. It reveals a dimension-dependent runtime penalty induced by privacy, and provides near-matching upper bounds via DP-SGD in non-smooth and smooth regimes, plus information-theoretic and mean-estimation-inspired arguments to bound the information gained by private responses. It also shows that, for information-limited orbits, the information capacity fundamentally constrains the optimizer, and demonstrates reductions between DP-ERM and DP-SCO, implying DP-SCO is no harder than DP-ERM up to polylog factors. The results collectively clarify the computational costs of private optimization beyond the local privacy model and highlight the limitations of gradient quantization and small-batch strategies in privacy-preserving learning.
Abstract
We study the running time, in terms of first order oracle queries, of differentially private empirical/population risk minimization of Lipschitz convex losses. We first consider the setting where the loss is non-smooth and the optimizer interacts with a private proxy oracle, which sends only private messages about a minibatch of gradients. In this setting, we show that expected running time $Ω(\min\{\frac{\sqrt{d}}{α^2}, \frac{d}{\log(1/α)}\})$ is necessary to achieve $α$ excess risk on problems of dimension $d$ when $d \geq 1/α^2$. Upper bounds via DP-SGD show these results are tight when $d>\tildeΩ(1/α^4)$. We further show our lower bound can be strengthened to $Ω(\min\{\frac{d}{\bar{m}α^2}, \frac{d}{\log(1/α)} \})$ for algorithms which use minibatches of size at most $\bar{m} < \sqrt{d}$. We next consider smooth losses, where we relax the private oracle assumption and give lower bounds under only the condition that the optimizer is private. Here, we lower bound the expected number of first order oracle calls by $\tildeΩ\big(\frac{\sqrt{d}}α + \min\{\frac{1}{α^2}, n\}\big)$, where $n$ is the size of the dataset. Modifications to existing algorithms show this bound is nearly tight. Compared to non-private lower bounds, our results show that differentially private optimizers pay a dimension dependent runtime penalty. Finally, as a natural extension of our proof technique, we show lower bounds in the non-smooth setting for optimizers interacting with information limited oracles. Specifically, if the proxy oracle transmits at most $Γ$-bits of information about the gradients in the minibatch, then $Ω\big(\min\{\frac{d}{α^2Γ}, \frac{d}{\log(1/α)}\}\big)$ oracle calls are needed. This result shows fundamental limitations of gradient quantization techniques in optimization.
