A Generalized Version of Chung's Lemma and its Applications
Li Jiang, Xiao Li, Andre Milzarek, Junwen Qiu
TL;DR
This work introduces a generalized Chung's Lemma that accommodates a broad class of step-size rules by analyzing a recursion of the form $a_{k+1}\le(1-1/s(b_k))a_k+1/t(b_k)$ with convex rate mappings $r=b\mapsto s(b)/t(b)$. Leveraging this tool, the authors derive non-asymptotic convergence rates for stochastic gradient methods under the $(\theta,\mu)$-Polyak-Lojasiewicz condition for exponential, cosine, constant, and polynomial step sizes, with both SGD and random reshuffling. A key contribution is the splitting technique and an extension lemma that enable handling non-polynomial schedules and partial applicability across iterates, leading to rates that are explicit in the PL exponent $\theta$ and adapt to landscape and gradient-noise characteristics, notably showing exponential steps achieve landscape and noise adaptivity. The results unify and extend existing non-asymptotic analyses, providing practical rate bounds and insights into step-size design for stochastic optimization in nonconvex settings. This framework offers a systematic approach to certify convergence rates under general step-size dynamics, with clear implications for algorithmic tuning in large-scale learning problems.
Abstract
Chung's Lemma is a classical tool for establishing asymptotic convergence rates of (stochastic) optimization methods under strong convexity-type assumptions and appropriate polynomial diminishing step sizes. In this work, we develop a generalized version of Chung's Lemma, which provides a simple non-asymptotic convergence framework for a more general family of step size rules. We demonstrate broad applicability of the proposed generalized lemma by deriving tight non-asymptotic convergence rates for a large variety of stochastic methods. In particular, we obtain partially new non-asymptotic complexity results for stochastic optimization methods, such as Stochastic Gradient Descent (SGD) and Random Reshuffling (RR), under a general $(θ,μ)$-Polyak-Lojasiewicz (PL) condition and for various step sizes strategies, including polynomial, constant, exponential, and cosine step sizes rules. Notably, as a by-product of our analysis, we observe that exponential step sizes exhibit superior adaptivity to both landscape geometry and gradient noise; specifically, they achieve optimal convergence rates without requiring exact knowledge of the underlying landscape or separate parameter selection strategies for noisy and noise-free regimes. Our results demonstrate that the developed variant of Chung's Lemma offers a versatile, systematic, and streamlined approach to establish non-asymptotic convergence rates under general step size rules.
