Tight Lower Bounds and Optimal Algorithms for Stochastic Nonconvex Optimization with Heavy-Tailed Noise

Adrien Fradin; Abdurakhmon Sadiev; Laurent Condat; Peter Richtárik

Tight Lower Bounds and Optimal Algorithms for Stochastic Nonconvex Optimization with Heavy-Tailed Noise

Adrien Fradin, Abdurakhmon Sadiev, Laurent Condat, Peter Richtárik

Abstract

We study stochastic nonconvex optimization under heavy-tailed noise. In this setting, the stochastic gradients only have bounded $p$-th central moment ($p$-BCM) for some $p \in (1,2]$. Building on the foundational work of Arjevani et al. (2022) in stochastic optimization, we establish tight sample complexity lower bounds for all first-order methods under \emph{relaxed} mean-squared smoothness ($q$-WAS) and $δ$-similarity ($(q, δ)$-S) assumptions, allowing any exponent $q \in [1,2]$ instead of the standard $q = 2$. These results substantially broaden the scope of existing lower bounds. To complement them, we show that Normalized Stochastic Gradient Descent with Momentum Variance Reduction (NSGD-MVR), a known algorithm, matches these bounds in expectation. Beyond expectation guarantees, we introduce a new algorithm, Double-Clipped NSGD-MVR, which allows the derivation of high-probability convergence rates under weaker assumptions than in previous works. Finally, for second-order methods with stochastic Hessians satisfying bounded $q$-th central moment assumptions for some exponent $q \in [1, 2]$ (allowing $q \neq p$), we establish sharper lower bounds than previous works while improving over Sadiev et al. (2025) (where only $p = q$ is considered) and yielding stronger convergence exponents. Together, these results provide a nearly complete complexity characterization of stochastic nonconvex optimization in heavy-tailed regimes.

Tight Lower Bounds and Optimal Algorithms for Stochastic Nonconvex Optimization with Heavy-Tailed Noise

Abstract

We study stochastic nonconvex optimization under heavy-tailed noise. In this setting, the stochastic gradients only have bounded

-th central moment (

-BCM) for some

. Building on the foundational work of Arjevani et al. (2022) in stochastic optimization, we establish tight sample complexity lower bounds for all first-order methods under \emph{relaxed} mean-squared smoothness (

-WAS) and

-similarity (

-S) assumptions, allowing any exponent

instead of the standard

. These results substantially broaden the scope of existing lower bounds. To complement them, we show that Normalized Stochastic Gradient Descent with Momentum Variance Reduction (NSGD-MVR), a known algorithm, matches these bounds in expectation. Beyond expectation guarantees, we introduce a new algorithm, Double-Clipped NSGD-MVR, which allows the derivation of high-probability convergence rates under weaker assumptions than in previous works. Finally, for second-order methods with stochastic Hessians satisfying bounded

-th central moment assumptions for some exponent

(allowing

), we establish sharper lower bounds than previous works while improving over Sadiev et al. (2025) (where only

is considered) and yielding stronger convergence exponents. Together, these results provide a nearly complete complexity characterization of stochastic nonconvex optimization in heavy-tailed regimes.

Tight Lower Bounds and Optimal Algorithms for Stochastic Nonconvex Optimization with Heavy-Tailed Noise

Abstract

Tight Lower Bounds and Optimal Algorithms for Stochastic Nonconvex Optimization with Heavy-Tailed Noise

Abstract

Paper Structure

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (124)