Accelerated stochastic first-order method for convex optimization under heavy-tailed noise
Chuan He, Zhaosong Lu
TL;DR
This paper tackles convex composite optimization with heavy-tailed stochastic noise by formulating $F(x)=f(x)+h(x)$ and allowing $f$ to satisfy a hybrid smoothness condition while the proximal operator of $h$ is exactly computable. It proves that a vanilla accelerated stochastic proximal subgradient method (SPGM), applied without clipping or normalization, achieves universally optimal first-order oracle complexity bounds across smooth, weakly smooth, and nonsmooth regimes under heavy-tailed noise, and it extends these results to high-probability guarantees. The core contributions are explicit complexity bounds for SPGM and its accelerated variant under heavy-tailed noise, demonstrating that acceleration yields optimal rates even when gradient estimators have unbounded variance in the tail. These results bridge the gap between universal gradient methods and stochastic optimization under heavy-tailed noise, highlighting that clipping or normalization may be unnecessary for achieving optimal convergence in a broad class of convex problems.
Abstract
We study convex composite optimization problems, where the objective function is given by the sum of a prox-friendly function and a convex function whose subgradients are estimated under heavy-tailed noise. Existing work often employs gradient clipping or normalization techniques in stochastic first-order methods to address heavy-tailed noise. In this paper, we demonstrate that a vanilla stochastic algorithm -- without additional modifications such as clipping or normalization -- can achieve optimal complexity for these problems. In particular, we establish that an accelerated stochastic proximal subgradient method achieves a first-order oracle complexity that is universally optimal for smooth, weakly smooth, and nonsmooth convex optimization, as well as for stochastic convex optimization under heavy-tailed noise. Numerical experiments are further provided to validate our theoretical results.
