How to Make the Gradients Small Privately: Improved Rates for Differentially Private Non-Convex Optimization

Andrew Lowy; Jonathan Ullman; Stephen J. Wright

How to Make the Gradients Small Privately: Improved Rates for Differentially Private Non-Convex Optimization

Andrew Lowy, Jonathan Ullman, Stephen J. Wright

TL;DR

The paper develops a private warm-start framework for non-convex DP optimization that composes a private approximate risk minimizer with a private stationary-point finder to obtain faster convergence to α-stationary points. It provides new DP rates for empirical losses, achieving near-optimal performance for quasar-convex and KL* losses, and extends to second-order guarantees. The framework yields improved rates for population losses and non-convex GLMs, including dimension-free results via Johnson-Lindenstrauss reductions. Empirical results illustrate privacy-driven improvements in high-privacy regimes and validate the practical relevance of the proposed approach. Overall, the work advances both theoretical and practical understanding of DP non-convex optimization across multiple problem classes.

Abstract

We provide a simple and flexible framework for designing differentially private algorithms to find approximate stationary points of non-convex loss functions. Our framework is based on using a private approximate risk minimizer to "warm start" another private algorithm for finding stationary points. We use this framework to obtain improved, and sometimes optimal, rates for several classes of non-convex loss functions. First, we obtain improved rates for finding stationary points of smooth non-convex empirical loss functions. Second, we specialize to quasar-convex functions, which generalize star-convex functions and arise in learning dynamical systems and training some neural nets. We achieve the optimal rate for this class. Third, we give an optimal algorithm for finding stationary points of functions satisfying the Kurdyka-Lojasiewicz (KL) condition. For example, over-parameterized neural networks often satisfy this condition. Fourth, we provide new state-of-the-art rates for stationary points of non-convex population loss functions. Fifth, we obtain improved rates for non-convex generalized linear models. A modification of our algorithm achieves nearly the same rates for second-order stationary points of functions with Lipschitz Hessian, improving over the previous state-of-the-art for each of the above problems.

How to Make the Gradients Small Privately: Improved Rates for Differentially Private Non-Convex Optimization

TL;DR

Abstract

Paper Structure (43 sections, 24 theorems, 53 equations, 3 figures, 4 algorithms)

This paper contains 43 sections, 24 theorems, 53 equations, 3 figures, 4 algorithms.

Introduction
Stationary Points of Empirical Loss Functions.
Contribution 1.
Contribution 2.
Second-Order Stationary Points.
Contribution 3.
Stationary Points of Population Loss Functions.
Contribution 4.
Contribution 5.
Our Approach
Roadmap
Preliminaries
Assumptions and Notation.
Differential Privacy.
Our Warm-Start Algorithmic Framework
...and 28 more sections

Key Result

Lemma 2.6

If $\mathcal{A}$ is $(\varepsilon_1, \delta_1)$-DP and $\mathcal{B}$ is $(\varepsilon_2, \delta_2)$-DP, then $\mathcal{B} \circ \mathcal{A}$ is $(\varepsilon_1 + \varepsilon_2, \delta_1 + \delta_2)$-DP.

Figures (3)

Figure 1: Summary of results for second-order stationary points (SOSP). All bounds should be read as $\min(1, ...)$. SOTA = state-of-the-art. $\zeta := 1 \wedge \left(\frac{d}{\varepsilon n} + \sqrt{\frac{d}{n}}\right)$. $r := \hbox{rank}(X)$. We omit logarithms, Lipschitz and smoothness paramaters. The GLM algorithm of arora2022faster only finds FOSP, not SOSP.
Figure 2: Training Loss: Gradient Norm vs. $\varepsilon$
Figure 3: Test Loss: Gradient Norm vs. $\varepsilon$

Theorems & Definitions (42)

Definition 2.1: Lipschitz continuity
Definition 2.2: Smoothness
Definition 2.4: Stationary Points
Definition 2.5: Differential Privacy dwork2006calibrating
Lemma 2.6: Basic Composition
Lemma 3.1
Theorem 3.2: First-Order Stationary Points for ERM: Meta-Algorithm
proof
Lemma 3.3
Theorem 3.4: Second-order Stationary Points for ERM: Meta-Algorithm
...and 32 more

How to Make the Gradients Small Privately: Improved Rates for Differentially Private Non-Convex Optimization

TL;DR

Abstract

How to Make the Gradients Small Privately: Improved Rates for Differentially Private Non-Convex Optimization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (42)