Breaking the Heavy-Tailed Noise Barrier in Stochastic Optimization Problems
Nikita Puchkin, Eduard Gorbunov, Nikolay Kutuzov, Alexander Gasnikov
TL;DR
We address stochastic convex optimization under structured heavy-tailed noise by stabilizing gradients with the smoothed median of means, enabling high-probability convergence bounds for clipped-SGD and clipped-SSTM that exceed classical $\mathcal{O}(K^{-2(\alpha-1)/\alpha})$ rates. The analysis hinges on decomposing component noise densities into a symmetric part and a lighter antisymmetric part, yielding finite bias and variance bounds for SMoM estimators and translating these into improved rates for both clipped-SGD and its accelerated variant in convex and strongly convex regimes. The results demonstrate practical gains on heavy-tailed problems and point to extensions to non-convex, non-smooth, and distributed settings, with potential further log-factor improvements. Overall, the structured-noise model and SMoM gradient estimation provide a principled route to faster stochastic optimization under heavy-tailed perturbations.
Abstract
We consider stochastic optimization problems with heavy-tailed noise with structured density. For such problems, we show that it is possible to get faster rates of convergence than $\mathcal{O}(K^{-2(α- 1)/α})$, when the stochastic gradients have finite moments of order $α\in (1, 2]$. In particular, our analysis allows the noise norm to have an unbounded expectation. To achieve these results, we stabilize stochastic gradients, using smoothed medians of means. We prove that the resulting estimates have negligible bias and controllable variance. This allows us to carefully incorporate them into clipped-SGD and clipped-SSTM and derive new high-probability complexity bounds in the considered setup.
