Handbook of Convergence Theorems for (Stochastic) Gradient Methods

Guillaume Garrigos; Robert M. Gower

Handbook of Convergence Theorems for (Stochastic) Gradient Methods

Guillaume Garrigos, Robert M. Gower

TL;DR

This handbook compiles concise, copyable convergence proofs for gradient-based methods across smooth, convex, strongly convex, and Polyak-Łojasiewicz settings, including stochastic variants (SGD, minibatch SGD, momentum, subgradient, and SPS) and nonsmooth tools (proximal methods, subdifferentials). It introduces core concepts such as expected smoothness, variance transfer, and interpolation to unify and bound stochastic behavior, while providing explicit iteration/complexity rates under various step-size regimes. The work connects deterministic and stochastic theory through proximal/nonsmooth analysis and momentum techniques, delivering practical, modular proofs with bibliographic guidance to foundational sources. Collectively, the sections offer a comprehensive, modular reference for convergence proofs and rate guarantees in gradient-based optimization. The practical impact lies in a readily applicable reference workflow for proving convergence under common optimization structures and in understanding how problems like interpolation and PL conditions influence rates.

Abstract

This is a handbook of simple proofs of the convergence of gradient and stochastic gradient descent type methods. We consider functions that are Lipschitz, smooth, convex, strongly convex, and/or Polyak-Łojasiewicz functions. Our focus is on ``good proofs'' that are also simple. Each section can be consulted separately. We start with proofs of gradient descent, then on stochastic variants, including minibatching and momentum. Then move on to nonsmooth problems with the subgradient method, the proximal gradient descent and their stochastic variants. Our focus is on global convergence rates and complexity rates. Some slightly less common proofs found here include that of SGD (Stochastic gradient descent) with a proximal step, with momentum, and with mini-batching without replacement.

Handbook of Convergence Theorems for (Stochastic) Gradient Methods

TL;DR

Abstract

Paper Structure (68 sections, 99 theorems, 380 equations, 1 figure, 1 table, 10 algorithms)

This paper contains 68 sections, 99 theorems, 380 equations, 1 figure, 1 table, 10 algorithms.

Introduction
Theory : Smooth functions and convexity
Differentiability
Notations
Lipschitz functions
Convexity
Strong convexity
Polyak-Łojasiewicz
Smoothness
Smoothness and nonconvexity
Smoothness and Convexity
Gradient Descent
Convergence for convex and smooth functions
Convergence for strongly convex and smooth functions
Convergence for Polyak-Łojasiewicz and smooth functions
...and 53 more sections

Key Result

lemma 6

Let $\mathcal{F} : \mathbb{R}^d \to \mathbb{R}^p$ be differentiable, and $L>0$. Then $\mathcal{F}$ is $L$-Lipschitz if and only if

Figures (1)

Figure 1: Graph of a PŁ function $f: \mathbb{R}^2 \to \mathbb{R}$. Note that the function is not convex, but that the only critical points are the global minimizers (displayed as a white curve).

Theorems & Definitions (241)

definition 1: Jacobian
remark 2: Gradient
definition 3: Hessian
remark 4: Hessian and eigenvalues
definition 5
lemma 6
proof
definition 7
lemma 8
proof
...and 231 more

Handbook of Convergence Theorems for (Stochastic) Gradient Methods

TL;DR

Abstract

Handbook of Convergence Theorems for (Stochastic) Gradient Methods

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (241)