Efficient Private SCO for Heavy-Tailed Data via Averaged Clipping
Chenhan Jin, Kaiwen Zhou, Bo Han, James Cheng, Tieyong Zeng
TL;DR
This work tackles differential private stochastic convex optimization under heavy-tailed data by introducing a one-time gradient clipping scheme, AClip, and a private mean estimator that augments clipping with Gaussian noise. The proposed method, AClipped-dpSGD, achieves high-probability excess-risk bounds for both constrained and unconstrained convex problems, and extends to strongly convex and Hölder-continuous/non-smooth objectives, all while reducing gradient complexity compared to DP-SGD/DP-GD. The authors provide principled tuning guidelines for clipping and batch size, and validate the theory with extensive experiments on synthetic and real datasets, including CIFAR-10 with ResNet-18. The results demonstrate faster running times and improved privacy-utility trade-offs for heavy-tailed data, making the approach practical for large-scale private learning tasks. Overall, the paper offers a new perspective on clipping strategies in private optimization and delivers concrete, high-probability guarantees across several important problem classes.
Abstract
We consider stochastic convex optimization for heavy-tailed data with the guarantee of being differentially private (DP). Most prior works on differentially private stochastic convex optimization for heavy-tailed data are either restricted to gradient descent (GD) or performed multi-times clipping on stochastic gradient descent (SGD), which is inefficient for large-scale problems. In this paper, we consider a one-time clipping strategy and provide principled analyses of its bias and private mean estimation. We establish new convergence results and improved complexity bounds for the proposed algorithm called AClipped-dpSGD for constrained and unconstrained convex problems. We also extend our convergent analysis to the strongly convex case and non-smooth case (which works for generalized smooth objectives with H$\ddot{\text{o}}$lder-continuous gradients). All the above results are guaranteed with a high probability for heavy-tailed data. Numerical experiments are conducted to justify the theoretical improvement.
