A Generalized Version of Chung's Lemma and its Applications

Li Jiang; Xiao Li; Andre Milzarek; Junwen Qiu

A Generalized Version of Chung's Lemma and its Applications

Li Jiang, Xiao Li, Andre Milzarek, Junwen Qiu

TL;DR

This work introduces a generalized Chung's Lemma that accommodates a broad class of step-size rules by analyzing a recursion of the form $a_{k+1}\le(1-1/s(b_k))a_k+1/t(b_k)$ with convex rate mappings $r=b\mapsto s(b)/t(b)$. Leveraging this tool, the authors derive non-asymptotic convergence rates for stochastic gradient methods under the $(\theta,\mu)$-Polyak-Lojasiewicz condition for exponential, cosine, constant, and polynomial step sizes, with both SGD and random reshuffling. A key contribution is the splitting technique and an extension lemma that enable handling non-polynomial schedules and partial applicability across iterates, leading to rates that are explicit in the PL exponent $\theta$ and adapt to landscape and gradient-noise characteristics, notably showing exponential steps achieve landscape and noise adaptivity. The results unify and extend existing non-asymptotic analyses, providing practical rate bounds and insights into step-size design for stochastic optimization in nonconvex settings. This framework offers a systematic approach to certify convergence rates under general step-size dynamics, with clear implications for algorithmic tuning in large-scale learning problems.

Abstract

Chung's Lemma is a classical tool for establishing asymptotic convergence rates of (stochastic) optimization methods under strong convexity-type assumptions and appropriate polynomial diminishing step sizes. In this work, we develop a generalized version of Chung's Lemma, which provides a simple non-asymptotic convergence framework for a more general family of step size rules. We demonstrate broad applicability of the proposed generalized lemma by deriving tight non-asymptotic convergence rates for a large variety of stochastic methods. In particular, we obtain partially new non-asymptotic complexity results for stochastic optimization methods, such as Stochastic Gradient Descent (SGD) and Random Reshuffling (RR), under a general $(θ,μ)$-Polyak-Lojasiewicz (PL) condition and for various step sizes strategies, including polynomial, constant, exponential, and cosine step sizes rules. Notably, as a by-product of our analysis, we observe that exponential step sizes exhibit superior adaptivity to both landscape geometry and gradient noise; specifically, they achieve optimal convergence rates without requiring exact knowledge of the underlying landscape or separate parameter selection strategies for noisy and noise-free regimes. Our results demonstrate that the developed variant of Chung's Lemma offers a versatile, systematic, and streamlined approach to establish non-asymptotic convergence rates under general step size rules.

A Generalized Version of Chung's Lemma and its Applications

TL;DR

This work introduces a generalized Chung's Lemma that accommodates a broad class of step-size rules by analyzing a recursion of the form

with convex rate mappings

. Leveraging this tool, the authors derive non-asymptotic convergence rates for stochastic gradient methods under the

-Polyak-Lojasiewicz condition for exponential, cosine, constant, and polynomial step sizes, with both SGD and random reshuffling. A key contribution is the splitting technique and an extension lemma that enable handling non-polynomial schedules and partial applicability across iterates, leading to rates that are explicit in the PL exponent

and adapt to landscape and gradient-noise characteristics, notably showing exponential steps achieve landscape and noise adaptivity. The results unify and extend existing non-asymptotic analyses, providing practical rate bounds and insights into step-size design for stochastic optimization in nonconvex settings. This framework offers a systematic approach to certify convergence rates under general step-size dynamics, with clear implications for algorithmic tuning in large-scale learning problems.

Abstract

-Polyak-Lojasiewicz (PL) condition and for various step sizes strategies, including polynomial, constant, exponential, and cosine step sizes rules. Notably, as a by-product of our analysis, we observe that exponential step sizes exhibit superior adaptivity to both landscape geometry and gradient noise; specifically, they achieve optimal convergence rates without requiring exact knowledge of the underlying landscape or separate parameter selection strategies for noisy and noise-free regimes. Our results demonstrate that the developed variant of Chung's Lemma offers a versatile, systematic, and streamlined approach to establish non-asymptotic convergence rates under general step size rules.

Paper Structure (26 sections, 21 theorems, 138 equations, 2 figures, 1 table, 2 algorithms)

This paper contains 26 sections, 21 theorems, 138 equations, 2 figures, 1 table, 2 algorithms.

Introduction
Contributions
Related Works
Generalized Chung's Lemma
Convergence Rates of Stochastic Methods under the PL Condition
Basic Assumptions and Descent Properties
Main Recursion and Analysis Roadmap
Notations and Important Constants
The Analysis of Exponential and Cosine Step Sizes
Analyzing Constant and Polynomial Step Sizes
Constant Step Size
Results on Polynomial Step Sizes.
Landscape Adaptivity and Noise Adaptivity of Exponential Step Sizes
Landscape Adaptivity
Noise Adaptivity
...and 11 more sections

Key Result

Theorem 2.1

Let $\{a_k\}_k \subseteq \mathbb{R},\, \{b_k\}_k \subseteq \mathbb{R}$ and functions $s,t: \mathbb{R} \to \mathbb{R}$ be given. Assume that $\{a_k\}_k$ follows eq:general-recursion and there is an interval $I$ such that $b_k \in I$ for all $k\in[K]$, $s(x)\geqslant 1$ and $t(x)>0$ for all $x\in I$, then we have

Figures (2)

Figure 1: Heatmap of the rates $\mathcal{w}_s$, $\mathcal{w}_r$ w.r.t. $p,\theta$. The blue lines depict the optimal choice of $p$ w.r.t. $\theta$; left: $p = \varrho_s = \frac{2\theta}{4\theta-1}$ (${\sf SGD }$), right: $p = \varrho_r = \frac{\theta}{3\theta-1}$ (${\sf RR }$).
Figure 2: Relation between $\log_2(y_K)$ and $\log_2(K)$ for different $\theta$ with polynomial and exponential step sizes. A steeper slope corresponds to a faster sublinear rate.

Theorems & Definitions (25)

Theorem 2.1: Generalized Chung's Lemma
Example 2.2
Remark 2.3
Lemma 2.4: Non-asymptotic Chung's Lemma
Lemma 2.5: Extension Lemma
Corollary 2.6
Definition 3.3: ($\theta$,$\mu$)-PL Condition
Lemma 3.5: Descent-type Properties
Remark 3.6
Lemma 3.7
...and 15 more

A Generalized Version of Chung's Lemma and its Applications

TL;DR

Abstract

A Generalized Version of Chung's Lemma and its Applications

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (25)