Handbook of Convergence Theorems for (Stochastic) Gradient Methods
Guillaume Garrigos, Robert M. Gower
TL;DR
This handbook compiles concise, copyable convergence proofs for gradient-based methods across smooth, convex, strongly convex, and Polyak-Łojasiewicz settings, including stochastic variants (SGD, minibatch SGD, momentum, subgradient, and SPS) and nonsmooth tools (proximal methods, subdifferentials). It introduces core concepts such as expected smoothness, variance transfer, and interpolation to unify and bound stochastic behavior, while providing explicit iteration/complexity rates under various step-size regimes. The work connects deterministic and stochastic theory through proximal/nonsmooth analysis and momentum techniques, delivering practical, modular proofs with bibliographic guidance to foundational sources. Collectively, the sections offer a comprehensive, modular reference for convergence proofs and rate guarantees in gradient-based optimization. The practical impact lies in a readily applicable reference workflow for proving convergence under common optimization structures and in understanding how problems like interpolation and PL conditions influence rates.
Abstract
This is a handbook of simple proofs of the convergence of gradient and stochastic gradient descent type methods. We consider functions that are Lipschitz, smooth, convex, strongly convex, and/or Polyak-Łojasiewicz functions. Our focus is on ``good proofs'' that are also simple. Each section can be consulted separately. We start with proofs of gradient descent, then on stochastic variants, including minibatching and momentum. Then move on to nonsmooth problems with the subgradient method, the proximal gradient descent and their stochastic variants. Our focus is on global convergence rates and complexity rates. Some slightly less common proofs found here include that of SGD (Stochastic gradient descent) with a proximal step, with momentum, and with mini-batching without replacement.
