Table of Contents
Fetching ...

New Tight Bounds for SGD without Variance Assumption: A Computer-Aided Lyapunov Analysis

Daniel Cortild, Lucas Ketels, Juan Peypouquet, Guillaume Garrigos

TL;DR

This work develops variance-free convergence bounds for SGD on smooth convex and strongly convex objectives via a Lyapunov energy and Performance Estimation Problem, achieving unified guarantees for the entire step-size range $\gamma L\in(0,2)$. It yields explicit bias-variance decompositions in the convex case and sharp geometric rates in the strongly convex case, while revealing singular behavior at the optimal step-size and exponential growth of the variance in large-step regimes. The authors further validate the bounds’ tightness with PEP-based numerics and extend the framework to stochastic proximal methods and interpolation settings. Overall, the approach clarifies fundamental limits of SGD without variance assumptions and provides practical insights into step-size choices and extensions.

Abstract

The analysis of Stochastic Gradient Descent (SGD) often relies on making some assumption on the variance of the stochastic gradients, which is usually not satisfied or difficult to verify in practice. This paper contributes to a recent line of works which attempt to provide guarantees without making any variance assumption, leveraging only the (strong) convexity and smoothness of the loss functions. In this context, we prove new theoretical bounds derived from the monotonicity of a simple Lyapunov energy, improving the current state-of-the-art and extending their validity to larger step-sizes. Our theoretical analysis is backed by a Performance Estimation Problem analysis, which allows us to claim that, empirically, the bias term in our bounds is tight within our framework.

New Tight Bounds for SGD without Variance Assumption: A Computer-Aided Lyapunov Analysis

TL;DR

This work develops variance-free convergence bounds for SGD on smooth convex and strongly convex objectives via a Lyapunov energy and Performance Estimation Problem, achieving unified guarantees for the entire step-size range . It yields explicit bias-variance decompositions in the convex case and sharp geometric rates in the strongly convex case, while revealing singular behavior at the optimal step-size and exponential growth of the variance in large-step regimes. The authors further validate the bounds’ tightness with PEP-based numerics and extend the framework to stochastic proximal methods and interpolation settings. Overall, the approach clarifies fundamental limits of SGD without variance assumptions and provides practical insights into step-size choices and extensions.

Abstract

The analysis of Stochastic Gradient Descent (SGD) often relies on making some assumption on the variance of the stochastic gradients, which is usually not satisfied or difficult to verify in practice. This paper contributes to a recent line of works which attempt to provide guarantees without making any variance assumption, leveraging only the (strong) convexity and smoothness of the loss functions. In this context, we prove new theoretical bounds derived from the monotonicity of a simple Lyapunov energy, improving the current state-of-the-art and extending their validity to larger step-sizes. Our theoretical analysis is backed by a Performance Estimation Problem analysis, which allows us to claim that, empirically, the bias term in our bounds is tight within our framework.

Paper Structure

This paper contains 53 sections, 50 theorems, 255 equations, 14 figures.

Key Result

Lemma 2.2

Let Assumption S2::ass:conv hold. Assume that $\mathbb{E}\mathopen{}\left[{E_{t+1}}\right]\mathclose{} \leq \mathbb{E}\mathopen{}\left[{E_t}\right]\mathclose{}$ for every $t=0, \dots, T-1$, that $\rho > 0$, and, without loss of generality, that $a_0=1$. Then where $\bar{x}_T = \tfrac{1}{T}\sum_{t=0}^{T-1} x_t$ and $\bar{e} = \tfrac{1}{T}\sum_{t=0}^{T-1} e_t$.

Figures (14)

  • Figure 1: Theoretical and numerical bias term ($L=1$).
  • Figure 2: Theoretical and numerical average variance term ($L=1$).
  • Figure 3: Theoretical and numerical bias term ($L=1$, $\mu = 0.25$ and $\tfrac{2}{\mu + L} = 1.6$).
  • Figure 4: Theoretical and numerical total variance term ($L=1$, $\mu = 0.25$ and $\tfrac{2}{\mu + L} = 1.6$).
  • Figure 5: Theoretical and numerical bias term in the convex setting for the full range of step-sizes.
  • ...and 9 more figures

Theorems & Definitions (93)

  • Lemma 2.2: Bound from Lyapunov decrease
  • Theorem 2.3: Convex case, short step-sizes
  • Theorem 2.4: Convex case, optimal step-size
  • Remark 2.5: Stochastic proximal algorithm
  • Corollary 2.6: Convex case with interpolation, optimal step-size
  • Theorem 2.7: Convex case, large step-sizes
  • Lemma 3.2: Bound from Lyapunov decrease
  • Theorem 3.3: Strongly convex case, sharp bias
  • Theorem 3.4: Strongly convex case, sub-optimal bias
  • Corollary 3.5: Strongly convex case with interpolation, optimal bias
  • ...and 83 more