Differential Private Stochastic Optimization with Heavy-tailed Data: Towards Optimal Rates
Puning Zhao, Jiafei Wu, Zhe Liu, Chong Wang, Rongfei Fan, Qingming Li
TL;DR
The paper addresses stochastic convex optimization under differential privacy with heavy-tailed gradient noise, showing that prior gradient estimators introduce suboptimal tail behavior and an inflated dimensional factor. It proposes two methods—simple clipping and iterative updating—and derives high-probability, tail-aware risk bounds. The simple clipping method achieves near-minimax rates for small privacy budgets but incurs an extra term in general, while the iterative updating method attains the minimax rate $\tilde{O}\left(\sqrt{d/n} + \sqrt{d}\left(\frac{\sqrt{d}}{n\epsilon}\right)^{1-1/p}\right)$ for all $\epsilon$, matching the Kamath et al. lower bound up to logarithms. The core innovations include refined DP mean estimation under heavy tails, and privacy amplification via shuffling, enabling practical, theory-backed DP stochastic optimization in realistic heavy-tailed settings.
Abstract
We study convex optimization problems under differential privacy (DP). With heavy-tailed gradients, existing works achieve suboptimal rates. The main obstacle is that existing gradient estimators have suboptimal tail properties, resulting in a superfluous factor of $d$ in the union bound. In this paper, we explore algorithms achieving optimal rates of DP optimization with heavy-tailed gradients. Our first method is a simple clipping approach. Under bounded $p$-th order moments of gradients, with $n$ samples, it achieves $\tilde{O}(\sqrt{d/n}+\sqrt{d}(\sqrt{d}/nε)^{1-1/p})$ population risk with $ε\leq 1/\sqrt{d}$. We then propose an iterative updating method, which is more complex but achieves this rate for all $ε\leq 1$. The results significantly improve over existing methods. Such improvement relies on a careful treatment of the tail behavior of gradient estimators. Our results match the minimax lower bound in \cite{kamath2022improved}, indicating that the theoretical limit of stochastic convex optimization under DP is achievable.
