Table of Contents
Fetching ...

Convergence Analysis of Stochastic Accelerated Gradient Methods for Generalized Smooth Optimizations

Chenhao Yu, Yusu Hong, Junhong Lin

TL;DR

The paper tackles stochastic optimization with possibly nonconvex objectives under generalized smoothness and a relaxed affine-variance noise model. It develops RSAG and SGD analyses with both constant and adaptive step sizes, proving high-probability convergence rates that scale as $\tilde{O}(\sqrt{\log(1/\delta)/T})$ for function-value gaps in the convex setting and for squared gradient norms in the nonconvex setting, with improved rates $\tilde{O}(\log(1/\delta)/T)$ when noise levels are small. The adaptive analyses (AdaGrad-Norm and RSAG with adaptive steps) achieve the same rates without prior knowledge of problem parameters, and the results extend to the standard $L$-smooth case with comparable rates. Overall, the work unifies and extends stochastic acceleration under weaker noise and smoothness assumptions, providing practical guidance for choosing step sizes without full problem-parameter knowledge while preserving optimal high-probability guarantees.

Abstract

We investigate the Randomized Stochastic Accelerated Gradient (RSAG) method, utilizing either constant or adaptive step sizes, for stochastic optimization problems with generalized smooth objective functions. Under relaxed affine variance assumptions for the stochastic gradient noise, we establish high-probability convergence rates of order $\tilde{O}\left(\sqrt{\log(1/δ)/T}\right)$ for function value gaps in the convex setting, and for the squared gradient norms in the non-convex setting. Furthermore, when the noise parameters are sufficiently small, the convergence rate improves to $\tilde{O}\left(\log(1/δ)/T\right)$, where $T$ denotes the total number of iterations and $δ$ is the probability margin. Our analysis is also applicable to SGD with both constant and adaptive step sizes.

Convergence Analysis of Stochastic Accelerated Gradient Methods for Generalized Smooth Optimizations

TL;DR

The paper tackles stochastic optimization with possibly nonconvex objectives under generalized smoothness and a relaxed affine-variance noise model. It develops RSAG and SGD analyses with both constant and adaptive step sizes, proving high-probability convergence rates that scale as for function-value gaps in the convex setting and for squared gradient norms in the nonconvex setting, with improved rates when noise levels are small. The adaptive analyses (AdaGrad-Norm and RSAG with adaptive steps) achieve the same rates without prior knowledge of problem parameters, and the results extend to the standard -smooth case with comparable rates. Overall, the work unifies and extends stochastic acceleration under weaker noise and smoothness assumptions, providing practical guidance for choosing step sizes without full problem-parameter knowledge while preserving optimal high-probability guarantees.

Abstract

We investigate the Randomized Stochastic Accelerated Gradient (RSAG) method, utilizing either constant or adaptive step sizes, for stochastic optimization problems with generalized smooth objective functions. Under relaxed affine variance assumptions for the stochastic gradient noise, we establish high-probability convergence rates of order for function value gaps in the convex setting, and for the squared gradient norms in the non-convex setting. Furthermore, when the noise parameters are sufficiently small, the convergence rate improves to , where denotes the total number of iterations and is the probability margin. Our analysis is also applicable to SGD with both constant and adaptive step sizes.

Paper Structure

This paper contains 38 sections, 39 theorems, 321 equations, 1 algorithm.

Key Result

Theorem 1

Let $T>0$ and $\delta\in(0,1)$. Suppose that $\{\overline{x}_t\}_{x\in[T]}$ is a sequence generated by SGD or RSAG with the constant step size, $f$ is an $\left(L_0,L_1\right)$-smooth function and the step size $\theta_t$ satisfies where $\mathcal{A}, \mathcal{B}, \mathcal{C}, \mathcal{G}$ are defined as and $\overline{\Delta}_c$, $\mathcal{M}_c$ are given in the following orderThe explicit expr

Theorems & Definitions (85)

  • Definition 1: $\left(L_0,L_1\right)$-smoothness
  • Remark 1
  • Example 1
  • Example 2
  • Theorem 1
  • Remark 2
  • Theorem 2
  • Remark 3
  • Theorem 3
  • Remark 4
  • ...and 75 more