Private Stochastic Optimization With Large Worst-Case Lipschitz Parameter

Andrew Lowy; Meisam Razaviyayn

Private Stochastic Optimization With Large Worst-Case Lipschitz Parameter

Andrew Lowy, Meisam Razaviyayn

TL;DR

Compared with works on uniformly Lipschitz DP SO, the improved excess risk scales with the $k-th moment bound instead of the worst-case Lipschitz parameter of the loss, allowing for significantly faster rates in the presence of outliers and/or heavy-tailed data.

Abstract

We study differentially private (DP) stochastic optimization (SO) with loss functions whose worst-case Lipschitz parameter over all data may be extremely large or infinite. To date, the vast majority of work on DP SO assumes that the loss is uniformly Lipschitz continuous (i.e. stochastic gradients are uniformly bounded) over data. While this assumption is convenient, it often leads to pessimistic risk bounds. In many practical problems, the worst-case (uniform) Lipschitz parameter of the loss over all data may be huge due to outliers and/or heavy-tailed data. In such cases, the risk bounds for DP SO, which scale with the worst-case Lipschitz parameter, are vacuous. To address these limitations, we provide improved risk bounds that do not depend on the uniform Lipschitz parameter. Following a recent line of work [WXDX20, KLZ22], we assume that stochastic gradients have bounded $k$-th order moments for some $k \geq 2$. Compared with works on uniformly Lipschitz DP SO, our risk bounds scale with the $k$-th moment instead of the uniform Lipschitz parameter of the loss, allowing for significantly faster rates in the presence of outliers and/or heavy-tailed data. For smooth convex loss functions, we provide linear-time algorithms with state-of-the-art excess risk. We complement our excess risk upper bounds with novel lower bounds. In certain parameter regimes, our linear-time excess risk bounds are minimax optimal. Second, we provide the first algorithm to handle non-smooth convex loss functions. To do so, we develop novel algorithmic and stability-based proof techniques, which we believe will be useful for future work in obtaining optimal excess risk. Finally, our work is the first to address non-convex non-uniformly Lipschitz loss functions satisfying the Proximal-PL inequality; this covers some practical machine learning models. Our Proximal-PL algorithm has near-optimal excess risk.

Private Stochastic Optimization With Large Worst-Case Lipschitz Parameter

TL;DR

Abstract

-th order moments for some

. Compared with works on uniformly Lipschitz DP SO, our risk bounds scale with the

-th moment instead of the uniform Lipschitz parameter of the loss, allowing for significantly faster rates in the presence of outliers and/or heavy-tailed data. For smooth convex loss functions, we provide linear-time algorithms with state-of-the-art excess risk. We complement our excess risk upper bounds with novel lower bounds. In certain parameter regimes, our linear-time excess risk bounds are minimax optimal. Second, we provide the first algorithm to handle non-smooth convex loss functions. To do so, we develop novel algorithmic and stability-based proof techniques, which we believe will be useful for future work in obtaining optimal excess risk. Finally, our work is the first to address non-convex non-uniformly Lipschitz loss functions satisfying the Proximal-PL inequality; this covers some practical machine learning models. Our Proximal-PL algorithm has near-optimal excess risk.

Paper Structure (30 sections, 44 theorems, 223 equations, 1 figure, 14 algorithms)

This paper contains 30 sections, 44 theorems, 223 equations, 1 figure, 14 algorithms.

Introduction
Preliminaries
Contributions and Related Work
Private Heavy-Tailed Mean Estimation Building Blocks
Excess Risk Lower Bounds for Smooth (Strongly) Convex Losses
Linear-Time Algorithms for Smooth (Strongly) Convex Losses
Noisy Clipped Accelerated SGD for Smooth Convex Losses
Noisy Clipped SGD for Strongly Convex Losses
Algorithm for Non-Smooth (Strongly) Convex Losses
Localized Noisy Clipped Subgradient Method for Convex Losses
The Strongly Convex Case
Algorithm for Non-Convex Proximal-PL Loss Functions
Concluding Remarks and Open Questions
Additional Discussion of Related Work
Other Bounded Moment Conditions Besides \ref{['ass:tilde']}
...and 15 more sections

Key Result

Proposition 3

bun16 If $\mathcal{A}$ is $\rho$-zCDP, then $\mathcal{A}$ is $(\rho + 2\sqrt{\rho \log(1/\delta)}, \delta)$ for any $\delta > 0$.

Figures (1)

Figure 1: Smooth excess risk for $k=2$, $\widetilde{r}_2 = \sqrt{d}$; we omit logarithms. $\kappa = \beta/\mu$ is the condition number of $F$; $\kappa_f = \beta_f/\mu$ is the worst-case condition number of $f(\cdot, x)$. See \ref{['thm: localization convex', 'thm: localization strongly convex']} for non-smooth upper bounds.

Theorems & Definitions (85)

Example 1
Definition 1: Differential Privacy
Definition 2: Zero-Concentrated Differential Privacy (zCDP)
Proposition 3
Theorem 4: Informal/special cases, see \ref{['thm: convex ACSA one pass', 'thm: strongly convex smooth upper bound', 'thm: convex lower bound', 'thm: strongly convex lower bound', 'rem: affine optimal']}
Theorem 5: Informal, see \ref{['thm: localization convex']}, \ref{['thm: localization strongly convex']}
Lemma 6: bd14
Theorem 7: Smooth Convex, Informal
Theorem 8: Smooth Strongly Convex, Informal
Theorem 9: Informal
...and 75 more

Private Stochastic Optimization With Large Worst-Case Lipschitz Parameter

TL;DR

Abstract

Private Stochastic Optimization With Large Worst-Case Lipschitz Parameter

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (85)