Table of Contents
Fetching ...

On the Gradient Complexity of Private Optimization with Private Oracles

Michael Menart, Aleksandar Nikolov

TL;DR

The paper establishes minimax lower bounds on the first-order oracle complexity for private optimization of Lipschitz convex losses under differential privacy, using a private proxy oracle framework. It reveals a dimension-dependent runtime penalty induced by privacy, and provides near-matching upper bounds via DP-SGD in non-smooth and smooth regimes, plus information-theoretic and mean-estimation-inspired arguments to bound the information gained by private responses. It also shows that, for information-limited orbits, the information capacity fundamentally constrains the optimizer, and demonstrates reductions between DP-ERM and DP-SCO, implying DP-SCO is no harder than DP-ERM up to polylog factors. The results collectively clarify the computational costs of private optimization beyond the local privacy model and highlight the limitations of gradient quantization and small-batch strategies in privacy-preserving learning.

Abstract

We study the running time, in terms of first order oracle queries, of differentially private empirical/population risk minimization of Lipschitz convex losses. We first consider the setting where the loss is non-smooth and the optimizer interacts with a private proxy oracle, which sends only private messages about a minibatch of gradients. In this setting, we show that expected running time $Ω(\min\{\frac{\sqrt{d}}{α^2}, \frac{d}{\log(1/α)}\})$ is necessary to achieve $α$ excess risk on problems of dimension $d$ when $d \geq 1/α^2$. Upper bounds via DP-SGD show these results are tight when $d>\tildeΩ(1/α^4)$. We further show our lower bound can be strengthened to $Ω(\min\{\frac{d}{\bar{m}α^2}, \frac{d}{\log(1/α)} \})$ for algorithms which use minibatches of size at most $\bar{m} < \sqrt{d}$. We next consider smooth losses, where we relax the private oracle assumption and give lower bounds under only the condition that the optimizer is private. Here, we lower bound the expected number of first order oracle calls by $\tildeΩ\big(\frac{\sqrt{d}}α + \min\{\frac{1}{α^2}, n\}\big)$, where $n$ is the size of the dataset. Modifications to existing algorithms show this bound is nearly tight. Compared to non-private lower bounds, our results show that differentially private optimizers pay a dimension dependent runtime penalty. Finally, as a natural extension of our proof technique, we show lower bounds in the non-smooth setting for optimizers interacting with information limited oracles. Specifically, if the proxy oracle transmits at most $Γ$-bits of information about the gradients in the minibatch, then $Ω\big(\min\{\frac{d}{α^2Γ}, \frac{d}{\log(1/α)}\}\big)$ oracle calls are needed. This result shows fundamental limitations of gradient quantization techniques in optimization.

On the Gradient Complexity of Private Optimization with Private Oracles

TL;DR

The paper establishes minimax lower bounds on the first-order oracle complexity for private optimization of Lipschitz convex losses under differential privacy, using a private proxy oracle framework. It reveals a dimension-dependent runtime penalty induced by privacy, and provides near-matching upper bounds via DP-SGD in non-smooth and smooth regimes, plus information-theoretic and mean-estimation-inspired arguments to bound the information gained by private responses. It also shows that, for information-limited orbits, the information capacity fundamentally constrains the optimizer, and demonstrates reductions between DP-ERM and DP-SCO, implying DP-SCO is no harder than DP-ERM up to polylog factors. The results collectively clarify the computational costs of private optimization beyond the local privacy model and highlight the limitations of gradient quantization and small-batch strategies in privacy-preserving learning.

Abstract

We study the running time, in terms of first order oracle queries, of differentially private empirical/population risk minimization of Lipschitz convex losses. We first consider the setting where the loss is non-smooth and the optimizer interacts with a private proxy oracle, which sends only private messages about a minibatch of gradients. In this setting, we show that expected running time is necessary to achieve excess risk on problems of dimension when . Upper bounds via DP-SGD show these results are tight when . We further show our lower bound can be strengthened to for algorithms which use minibatches of size at most . We next consider smooth losses, where we relax the private oracle assumption and give lower bounds under only the condition that the optimizer is private. Here, we lower bound the expected number of first order oracle calls by , where is the size of the dataset. Modifications to existing algorithms show this bound is nearly tight. Compared to non-private lower bounds, our results show that differentially private optimizers pay a dimension dependent runtime penalty. Finally, as a natural extension of our proof technique, we show lower bounds in the non-smooth setting for optimizers interacting with information limited oracles. Specifically, if the proxy oracle transmits at most -bits of information about the gradients in the minibatch, then oracle calls are needed. This result shows fundamental limitations of gradient quantization techniques in optimization.

Paper Structure

This paper contains 48 sections, 26 theorems, 79 equations, 5 algorithms.

Key Result

Corollary 1

(Informal corollary of Theorem thm:main-lb) Let $\mathcal{A}$ be an $\alpha$-accurate (for non-smooth losses) implementation of DP-SGD with batch size $m$. Then its running time is $\Omega({\min\{{\frac{\sqrt{d} + d/m}{\alpha^2} , \frac{dm}{\log(1/\alpha)}}\}})$.

Theorems & Definitions (51)

  • Corollary 1
  • Definition 1: Information Capacity cover-thomas-elements
  • Definition 2: First Order Oracle
  • Definition 3: Proxy Oracle
  • Theorem 1
  • Theorem 2
  • Remark 1
  • Lemma 1
  • proof
  • Lemma 2
  • ...and 41 more