Complexity of normalized stochastic first-order methods with momentum under heavy-tailed noise
Chuan He, Zhaosong Lu, Defeng Sun, Zhanwang Deng
TL;DR
The paper addresses unconstrained smooth optimization under heavy-tailed stochastic gradient noise by developing three practical normalized SFOMs with momentum: Polyak, multi-extrapolated, and recursive momentum. Each method uses dynamically updated parameters and normalization to avoid dependence on unknown Lipschitz constants or noise bounds, achieving first-order oracle complexities that either match or improve the best-known results under heavy-tailed noise and weaker smoothness assumptions. The authors extend the analysis to higher-order smoothness to obtain accelerated rates for the multi-extrapolated variant and to a weakly average smoothness regime for the recursive variant. Comprehensive numerical experiments on data fitting, robust regression, and multimodal contrastive learning validate the practical effectiveness and illustrate parameter-tuning and momentum impacts. Overall, the work provides parameter-free or parameter-light SFOMs with strong theoretical guarantees and practical performance in the presence of heavy-tailed noise.
Abstract
In this paper, we propose practical normalized stochastic first-order methods with Polyak momentum, multi-extrapolated momentum, and recursive momentum for solving unconstrained optimization problems. These methods employ dynamically updated algorithmic parameters and do not require explicit knowledge of problem-dependent quantities such as the Lipschitz constant or noise bound. We establish first-order oracle complexity results for finding approximate stochastic stationary points under heavy-tailed noise and weakly average smoothness conditions -- both of which are weaker than the commonly used bounded variance and mean-squared smoothness assumptions. Our complexity bounds either improve upon or match the best-known results in the literature. Numerical experiments are presented to demonstrate the practical effectiveness of the proposed methods.
