Table of Contents
Fetching ...

Noise is All You Need: Private Second-Order Convergence of Noisy SGD

Dmitrii Avdiukhin, Michael Dinitz, Chenglin Fan, Grigory Yaroslavtsev

TL;DR

Noise is All You Need demonstrates that DP-SGD without gradient clipping can achieve second-order convergence in non-convex optimization under minimal smoothness assumptions, with privacy noise playing a constructive role. The authors prove that DP-SGD attains an $\alpha$-SOSP with $\alpha = \tilde{\Omega}\left( \frac{d^{1/4}}{\sqrt{n\varepsilon}} \right)$ and requires $\tilde{O}\left( \frac{\sigma^2}{\alpha^4} \right)$ stochastic gradient calls while ensuring $(\varepsilon,\delta)$-DP, under the condition $\Delta^2 \ge \sigma^2/B$. This work emphasizes simplicity over more complex private second-order methods and shows empirically that privacy budgets like $\varepsilon \in \{2,4,8\}$ yield performance close to non-private SGD on standard benchmarks. Collectively, the results broaden the practicality of private optimization by showing that noise alone can drive both privacy and second-order convergence in DP-SGD.

Abstract

Private optimization is a topic of major interest in machine learning, with differentially private stochastic gradient descent (DP-SGD) playing a key role in both theory and practice. Furthermore, DP-SGD is known to be a powerful tool in contexts beyond privacy, including robustness, machine unlearning, etc. Existing analyses of DP-SGD either make relatively strong assumptions (e.g., Lipschitz continuity of the loss function, or even convexity) or prove only first-order convergence (and thus might end at a saddle point in the non-convex setting). At the same time, there has been progress in proving second-order convergence of the non-private version of ``noisy SGD'', as well as progress in designing algorithms that are more complex than DP-SGD and do guarantee second-order convergence. We revisit DP-SGD and show that ``noise is all you need'': the noise necessary for privacy already implies second-order convergence under the standard smoothness assumptions, even for non-Lipschitz loss functions. Hence, we get second-order convergence essentially for free: DP-SGD, the workhorse of modern private optimization, under minimal assumptions can be used to find a second-order stationary point.

Noise is All You Need: Private Second-Order Convergence of Noisy SGD

TL;DR

Noise is All You Need demonstrates that DP-SGD without gradient clipping can achieve second-order convergence in non-convex optimization under minimal smoothness assumptions, with privacy noise playing a constructive role. The authors prove that DP-SGD attains an -SOSP with and requires stochastic gradient calls while ensuring -DP, under the condition . This work emphasizes simplicity over more complex private second-order methods and shows empirically that privacy budgets like yield performance close to non-private SGD on standard benchmarks. Collectively, the results broaden the practicality of private optimization by showing that noise alone can drive both privacy and second-order convergence in DP-SGD.

Abstract

Private optimization is a topic of major interest in machine learning, with differentially private stochastic gradient descent (DP-SGD) playing a key role in both theory and practice. Furthermore, DP-SGD is known to be a powerful tool in contexts beyond privacy, including robustness, machine unlearning, etc. Existing analyses of DP-SGD either make relatively strong assumptions (e.g., Lipschitz continuity of the loss function, or even convexity) or prove only first-order convergence (and thus might end at a saddle point in the non-convex setting). At the same time, there has been progress in proving second-order convergence of the non-private version of ``noisy SGD'', as well as progress in designing algorithms that are more complex than DP-SGD and do guarantee second-order convergence. We revisit DP-SGD and show that ``noise is all you need'': the noise necessary for privacy already implies second-order convergence under the standard smoothness assumptions, even for non-Lipschitz loss functions. Hence, we get second-order convergence essentially for free: DP-SGD, the workhorse of modern private optimization, under minimal assumptions can be used to find a second-order stationary point.

Paper Structure

This paper contains 28 sections, 19 theorems, 57 equations, 3 figures, 2 algorithms.

Key Result

Theorem 1.1

Under the standard assumptions of non-convex optimization -- in particular, without the Lipschitz condition -- noisy SGD (without gradient clipping) is $(\varepsilon,\delta)$-differentially private and for any $\alpha = \tilde{\Omega} \left( d^{1/4}/\sqrt{n \varepsilon} \right)$ finds an $\alpha$-se

Figures (3)

  • Figure 1: Testing accuracy of DP-SGD on CIFAR-10 dataset for various choices of $\varepsilon$. Testing accuracy is averaged over $10$ runs, with the shaded area showing the minimum and the maximum values over the runs.
  • Figure 2: Testing accuracy of DP-SGD on CIFAR-100 and CoLa datasets. Testing accuracy is averaged over $10$ runs, with the shaded area showing the minimum and the maximum values.
  • Figure 3: Additional results for training a convolutional neural network using DP-SGD on CIFAR-10 dataset

Theorems & Definitions (41)

  • Theorem 1.1: Informal, see Theorem \ref{['thm:combine_main']}
  • Definition 2.1: Differential Privacy DBLP:conf/eurocrypt/DworkKMMN06
  • Definition 2.2
  • Definition 2.3: $\alpha$-FOSP
  • Definition 2.4: $\alpha$-SOSP nesterovP06
  • Theorem 3.1: Standard DP for clipped SGD abadi_DeepLearning_2016
  • Lemma 3.2: Bound on the gradient; Proposition 4.3 from DBLP:conf/iclr/FangLF023, simplified
  • Theorem 3.3
  • proof
  • Theorem 3.4
  • ...and 31 more