Noise is All You Need: Private Second-Order Convergence of Noisy SGD
Dmitrii Avdiukhin, Michael Dinitz, Chenglin Fan, Grigory Yaroslavtsev
TL;DR
Noise is All You Need demonstrates that DP-SGD without gradient clipping can achieve second-order convergence in non-convex optimization under minimal smoothness assumptions, with privacy noise playing a constructive role. The authors prove that DP-SGD attains an $\alpha$-SOSP with $\alpha = \tilde{\Omega}\left( \frac{d^{1/4}}{\sqrt{n\varepsilon}} \right)$ and requires $\tilde{O}\left( \frac{\sigma^2}{\alpha^4} \right)$ stochastic gradient calls while ensuring $(\varepsilon,\delta)$-DP, under the condition $\Delta^2 \ge \sigma^2/B$. This work emphasizes simplicity over more complex private second-order methods and shows empirically that privacy budgets like $\varepsilon \in \{2,4,8\}$ yield performance close to non-private SGD on standard benchmarks. Collectively, the results broaden the practicality of private optimization by showing that noise alone can drive both privacy and second-order convergence in DP-SGD.
Abstract
Private optimization is a topic of major interest in machine learning, with differentially private stochastic gradient descent (DP-SGD) playing a key role in both theory and practice. Furthermore, DP-SGD is known to be a powerful tool in contexts beyond privacy, including robustness, machine unlearning, etc. Existing analyses of DP-SGD either make relatively strong assumptions (e.g., Lipschitz continuity of the loss function, or even convexity) or prove only first-order convergence (and thus might end at a saddle point in the non-convex setting). At the same time, there has been progress in proving second-order convergence of the non-private version of ``noisy SGD'', as well as progress in designing algorithms that are more complex than DP-SGD and do guarantee second-order convergence. We revisit DP-SGD and show that ``noise is all you need'': the noise necessary for privacy already implies second-order convergence under the standard smoothness assumptions, even for non-Lipschitz loss functions. Hence, we get second-order convergence essentially for free: DP-SGD, the workhorse of modern private optimization, under minimal assumptions can be used to find a second-order stationary point.
