Privacy of SGD under Gaussian or Heavy-Tailed Noise: Guarantees without Gradient Clipping

Umut Şimşekli; Mert Gürbüzbalaban; Sinan Yıldırım; Lingjiong Zhu

Privacy of SGD under Gaussian or Heavy-Tailed Noise: Guarantees without Gradient Clipping

Umut Şimşekli, Mert Gürbüzbalaban, Sinan Yıldırım, Lingjiong Zhu

TL;DR

The paper analyzes the differential privacy guarantees of noisy SGD when the injected noise is from an $\alpha$-stable distribution, covering both Gaussian ($\alpha=2$) and heavy-tailed regimes. It develops a Markov-chain stability approach with carefully crafted Lyapunov functions to bound the TV distance between trajectories on neighboring datasets, yielding a time-uniform $(0,\delta)$-DP with $\delta=\mathcal{O}(1/n)$ without gradient clipping or gradient/iterate projection under mild regularity and dissipativity assumptions. The results hold for non-convex losses and show that heavy-tailed noise can be a viable alternative to light-tailed noise, with dimension dependence weakening as tails become heavier. The paper also extends the DP guarantees from GD to SGD, discusses the Gaussian limit via $\alpha=2$ relative to prior Rényi-DP bounds, and provides a unified analytical framework connecting Markov stability with differential privacy.

Abstract

The injection of heavy-tailed noise into the iterates of stochastic gradient descent (SGD) has garnered growing interest in recent years due to its theoretical and empirical benefits for optimization and generalization. However, its implications for privacy preservation remain largely unexplored. Aiming to bridge this gap, we provide differential privacy (DP) guarantees for noisy SGD, when the injected noise follows an $α$-stable distribution, which includes a spectrum of heavy-tailed distributions (with infinite variance) as well as the light-tailed Gaussian distribution. Considering the $(ε, δ)$-DP framework, we show that SGD with heavy-tailed perturbations achieves $(0, O(1/n))$-DP for a broad class of loss functions which can be non-convex, where $n$ is the number of data points. As a remarkable byproduct, contrary to prior work that necessitates bounded sensitivity for the gradients or clipping the iterates, our theory can handle unbounded gradients without clipping, and reveals that under mild assumptions, such a projection step is not actually necessary. Our results suggest that, given other benefits of heavy-tails in optimization, heavy-tailed noising schemes can be a viable alternative to their light-tailed counterparts.

Privacy of SGD under Gaussian or Heavy-Tailed Noise: Guarantees without Gradient Clipping

TL;DR

The paper analyzes the differential privacy guarantees of noisy SGD when the injected noise is from an

-stable distribution, covering both Gaussian (

) and heavy-tailed regimes. It develops a Markov-chain stability approach with carefully crafted Lyapunov functions to bound the TV distance between trajectories on neighboring datasets, yielding a time-uniform

-DP with

without gradient clipping or gradient/iterate projection under mild regularity and dissipativity assumptions. The results hold for non-convex losses and show that heavy-tailed noise can be a viable alternative to light-tailed noise, with dimension dependence weakening as tails become heavier. The paper also extends the DP guarantees from GD to SGD, discusses the Gaussian limit via

relative to prior Rényi-DP bounds, and provides a unified analytical framework connecting Markov stability with differential privacy.

Abstract

-stable distribution, which includes a spectrum of heavy-tailed distributions (with infinite variance) as well as the light-tailed Gaussian distribution. Considering the

-DP framework, we show that SGD with heavy-tailed perturbations achieves

-DP for a broad class of loss functions which can be non-convex, where

is the number of data points. As a remarkable byproduct, contrary to prior work that necessitates bounded sensitivity for the gradients or clipping the iterates, our theory can handle unbounded gradients without clipping, and reveals that under mild assumptions, such a projection step is not actually necessary. Our results suggest that, given other benefits of heavy-tails in optimization, heavy-tailed noising schemes can be a viable alternative to their light-tailed counterparts.

Paper Structure (35 sections, 19 theorems, 145 equations)

This paper contains 35 sections, 19 theorems, 145 equations.

Introduction
Context.
Objective.
Contributions.
Technical Background
Differential privacy and the TV distances
Markov chain stability
Stable distributions
Main Assumptions
Regularity conditions
(Optional) Existence of a universal stable point
Privacy of Noisy Gradient Descent
The design of the Lyapunov functions and the distance between one-step transition kernels.
Estimation of the Lyapunov functions and ergodicity of the Markov chains.
Privacy guarantee for noisy GD.
...and 20 more sections

Key Result

Proposition 3

Let $\mathcal{A}$ be a randomized algorithm and $\delta \geq 0$. Then, the following stability condition holds for $\mathcal{A}$: if and only if $\mathcal{A}$ is $(0,\delta)$-DP.

Theorems & Definitions (23)

Definition 1: $(\epsilon,\delta)$-DP, dwork2014algorithmic
Definition 2: TV distance
Proposition 3
Definition 4: $V$-norm
Definition 5: $V$-uniform ergodicity
Lemma 6: RS2018
Lemma 7
Lemma 8
Lemma 9
Theorem 10
...and 13 more

Privacy of SGD under Gaussian or Heavy-Tailed Noise: Guarantees without Gradient Clipping

TL;DR

Abstract

Privacy of SGD under Gaussian or Heavy-Tailed Noise: Guarantees without Gradient Clipping

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (23)