Convergence Analysis of Stochastic Accelerated Gradient Methods for Generalized Smooth Optimizations

Chenhao Yu; Yusu Hong; Junhong Lin

Convergence Analysis of Stochastic Accelerated Gradient Methods for Generalized Smooth Optimizations

Chenhao Yu, Yusu Hong, Junhong Lin

TL;DR

The paper tackles stochastic optimization with possibly nonconvex objectives under generalized smoothness and a relaxed affine-variance noise model. It develops RSAG and SGD analyses with both constant and adaptive step sizes, proving high-probability convergence rates that scale as $\tilde{O}(\sqrt{\log(1/\delta)/T})$ for function-value gaps in the convex setting and for squared gradient norms in the nonconvex setting, with improved rates $\tilde{O}(\log(1/\delta)/T)$ when noise levels are small. The adaptive analyses (AdaGrad-Norm and RSAG with adaptive steps) achieve the same rates without prior knowledge of problem parameters, and the results extend to the standard $L$-smooth case with comparable rates. Overall, the work unifies and extends stochastic acceleration under weaker noise and smoothness assumptions, providing practical guidance for choosing step sizes without full problem-parameter knowledge while preserving optimal high-probability guarantees.

Abstract

We investigate the Randomized Stochastic Accelerated Gradient (RSAG) method, utilizing either constant or adaptive step sizes, for stochastic optimization problems with generalized smooth objective functions. Under relaxed affine variance assumptions for the stochastic gradient noise, we establish high-probability convergence rates of order $\tilde{O}\left(\sqrt{\log(1/δ)/T}\right)$ for function value gaps in the convex setting, and for the squared gradient norms in the non-convex setting. Furthermore, when the noise parameters are sufficiently small, the convergence rate improves to $\tilde{O}\left(\log(1/δ)/T\right)$, where $T$ denotes the total number of iterations and $δ$ is the probability margin. Our analysis is also applicable to SGD with both constant and adaptive step sizes.

Convergence Analysis of Stochastic Accelerated Gradient Methods for Generalized Smooth Optimizations

TL;DR

for function-value gaps in the convex setting and for squared gradient norms in the nonconvex setting, with improved rates

when noise levels are small. The adaptive analyses (AdaGrad-Norm and RSAG with adaptive steps) achieve the same rates without prior knowledge of problem parameters, and the results extend to the standard

-smooth case with comparable rates. Overall, the work unifies and extends stochastic acceleration under weaker noise and smoothness assumptions, providing practical guidance for choosing step sizes without full problem-parameter knowledge while preserving optimal high-probability guarantees.

Abstract

for function value gaps in the convex setting, and for the squared gradient norms in the non-convex setting. Furthermore, when the noise parameters are sufficiently small, the convergence rate improves to

, where

denotes the total number of iterations and

is the probability margin. Our analysis is also applicable to SGD with both constant and adaptive step sizes.

Convergence Analysis of Stochastic Accelerated Gradient Methods for Generalized Smooth Optimizations

TL;DR

Abstract

Convergence Analysis of Stochastic Accelerated Gradient Methods for Generalized Smooth Optimizations

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Theorems & Definitions (85)