Privacy of SGD under Gaussian or Heavy-Tailed Noise: Guarantees without Gradient Clipping
Umut Şimşekli, Mert Gürbüzbalaban, Sinan Yıldırım, Lingjiong Zhu
TL;DR
The paper analyzes the differential privacy guarantees of noisy SGD when the injected noise is from an $\alpha$-stable distribution, covering both Gaussian ($\alpha=2$) and heavy-tailed regimes. It develops a Markov-chain stability approach with carefully crafted Lyapunov functions to bound the TV distance between trajectories on neighboring datasets, yielding a time-uniform $(0,\delta)$-DP with $\delta=\mathcal{O}(1/n)$ without gradient clipping or gradient/iterate projection under mild regularity and dissipativity assumptions. The results hold for non-convex losses and show that heavy-tailed noise can be a viable alternative to light-tailed noise, with dimension dependence weakening as tails become heavier. The paper also extends the DP guarantees from GD to SGD, discusses the Gaussian limit via $\alpha=2$ relative to prior Rényi-DP bounds, and provides a unified analytical framework connecting Markov stability with differential privacy.
Abstract
The injection of heavy-tailed noise into the iterates of stochastic gradient descent (SGD) has garnered growing interest in recent years due to its theoretical and empirical benefits for optimization and generalization. However, its implications for privacy preservation remain largely unexplored. Aiming to bridge this gap, we provide differential privacy (DP) guarantees for noisy SGD, when the injected noise follows an $α$-stable distribution, which includes a spectrum of heavy-tailed distributions (with infinite variance) as well as the light-tailed Gaussian distribution. Considering the $(ε, δ)$-DP framework, we show that SGD with heavy-tailed perturbations achieves $(0, O(1/n))$-DP for a broad class of loss functions which can be non-convex, where $n$ is the number of data points. As a remarkable byproduct, contrary to prior work that necessitates bounded sensitivity for the gradients or clipping the iterates, our theory can handle unbounded gradients without clipping, and reveals that under mild assumptions, such a projection step is not actually necessary. Our results suggest that, given other benefits of heavy-tails in optimization, heavy-tailed noising schemes can be a viable alternative to their light-tailed counterparts.
