Convergent Privacy Loss of Noisy-SGD without Convexity and Smoothness

Eli Chien; Pan Li

Convergent Privacy Loss of Noisy-SGD without Convexity and Smoothness

Eli Chien, Pan Li

TL;DR

This paper proves that hidden-state Noisy-SGD on a bounded domain enjoys non-trivial convergent privacy loss even without convexity or smoothness, as long as the gradient is Hölder continuous. It achieves this by extending the shifted Rényi divergence framework with forward Wasserstein distance tracking and a Hölder reduction lemma, enabling tighter RDP bounds than prior work for both non-convex non-smooth and smooth strongly convex losses. The results cover full-batch and mini-batch regimes, including subsampling and shuffled minibatching, and provide practical guidance via optimizable shift allocations. The work advances privacy accounting for DP-SGD, offering a broader applicability to deep learning settings while highlighting important open problems and future directions for tighter bounds and practical implementations.

Abstract

We study the Differential Privacy (DP) guarantee of hidden-state Noisy-SGD algorithms over a bounded domain. Standard privacy analysis for Noisy-SGD assumes all internal states are revealed, which leads to a divergent R'enyi DP bound with respect to the number of iterations. Ye & Shokri (2022) and Altschuler & Talwar (2022) proved convergent bounds for smooth (strongly) convex losses, and raise open questions about whether these assumptions can be relaxed. We provide positive answers by proving convergent R'enyi DP bound for non-convex non-smooth losses, where we show that requiring losses to have Hölder continuous gradient is sufficient. We also provide a strictly better privacy bound compared to state-of-the-art results for smooth strongly convex losses. Our analysis relies on the improvement of shifted divergence analysis in multiple aspects, including forward Wasserstein distance tracking, identifying the optimal shifts allocation, and the H"older reduction lemma. Our results further elucidate the benefit of hidden-state analysis for DP and its applicability.

Convergent Privacy Loss of Noisy-SGD without Convexity and Smoothness

TL;DR

Abstract

Paper Structure (28 sections, 17 theorems, 61 equations, 2 figures, 1 table)

This paper contains 28 sections, 17 theorems, 61 equations, 2 figures, 1 table.

Introduction
Our Contributions and Analysis Overview
Preliminaries
Our hidden state DP-SGD privacy loss analysis
Full batch case with smooth losses
Non-smooth losses with Hölder continuous gradient
Mini-batch cases
Related Works
Conclusions and Open problems
Appendix
Standard Definitions
Privacy guarantees for shuffled cyclic mini-batch
The analysis
Improved bound by shuffling
Privacy guarantees for without replacement subsampling mini-batches
...and 13 more sections

Key Result

Lemma 2.3

For any $\alpha > 1$, any function $h$, and any probability distribution $\mu,\nu$, $D_\alpha(h \sharp \mu || h\sharp \nu) \leq D_\alpha(\mu||\nu).$

Figures (2)

Figure 1: (a) Our RDP guarantees for smooth losses over the bounded domain, where the noise variance is the same for all lines. Orange and green lines indicate the cases where the loss is further assumed to be (strongly) convex. The output perturbation directly utilizes the Gaussian mechanism with sensitivity chosen to be the diameter of the bounded domain. (b) The detailed comparison of our privacy bound with altschuler2022privacyye2022differentially for smooth strongly convex losses. The setting is the same as (a). We relegate the detail setting to Appendix \ref{['apx:exp']}.
Figure 2: (a) Our RDP bound for non-smooth loss with $(L,\lambda)$-Hölder continuous gradient, where we empirically estimate the Hölder continuous constant $L$ of a 2 layer Multi-Layer Perceptron (MLP) for each $\lambda$. See Appendix \ref{['apx:exp']} for the detailed setting. (b) The illustration of the overall analysis. The decomposition of (A) + (B) parts is developed by altschuler2022privacy. It is done by constructing a coupling $(G_t,G_t^\prime)$, resulting a coupled process $\tilde{W_t}$. Part (A) is handled via standard composition or privacy amplification by subsampling in the mini-batch setting. Part (B) is handled by the shifted divergence analysis for smooth convex losses, which is also known as privacy amplification by iteration and will depend on the infinite Wasserstein distance $W_\infty(W_\tau,W_\tau^\prime)$. altschuler2022privacy use the domain diameter $D$ as an upper bound. In contrast, we perform a careful forward $W_\infty$ distance tracking analysis (part (C)) to give a better bound, which provides a strict improvement to the final privacy loss bound. We further modify the analysis of part (B) so that it becomes applicable to even non-convex non-smooth losses with Hölder continuous gradients.

Theorems & Definitions (30)

Definition 2.1: Rényi divergence
Definition 2.2: Rényi differential Privacy
Lemma 2.3: Post-processing property
Lemma 2.4: Strong composition for Rényi divergence
Definition 2.5: Hölder continuity
Definition 2.6: $W_\infty$ distance
Definition 2.7: Shifted Rényi divergence
Theorem 3.1: Privacy loss of Noisy-GD with smooth loss
Lemma 3.2: Shift reduction lemma altschuler2022privacyfeldman2018privacy
Lemma 3.3: Lipschitz reduction lemma altschuler2022privacy
...and 20 more

Convergent Privacy Loss of Noisy-SGD without Convexity and Smoothness

TL;DR

Abstract

Convergent Privacy Loss of Noisy-SGD without Convexity and Smoothness

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (30)