Functional Central Limit Theorem for Stochastic Gradient Descent

Kessang Flamand; Victor-Emmanuel Brunel

Functional Central Limit Theorem for Stochastic Gradient Descent

Kessang Flamand, Victor-Emmanuel Brunel

TL;DR

This work addresses understanding the full asymptotic behavior of SGD trajectories for convex objectives, not merely the endpoint. It introduces a functional central limit theorem by analyzing a rescaled SGD path with step size $t_n = \delta/n$, proving convergence to a diffusion process $\{Y_t\}$ on $(0,\infty)$ with SDE $dY_t = -t^{-1}H Y_t dt + \Sigma^{1/2} dB_t$, where $H = \delta\nabla^2\Phi(\theta^*) - I_d$ and $\Sigma = \delta^2\Gamma$. This yields Gaussian fluctuations of the trajectory and, in particular, $\sqrt{n}(\hat{\theta}_n - \theta^*) \Rightarrow N(0,\Sigma)$, providing a trajectory-level counterpart to classical CLTs and demonstrating applicability to non-smooth robust objectives like the geometric median. The results highlight a diffusion-based portrait of long-term SGD behavior under mild convexity assumptions, while noting limitations such as the need to know the local curvature to set $\delta$ and that standard SGD variance may be larger than ERM benchmarks; future work includes extending to averaging schemes (e.g., Polyak-Ruppert) for asymptotic efficiency.

Abstract

We study the asymptotic shape of the trajectory of the stochastic gradient descent algorithm applied to a convex objective function. Under mild regularity assumptions, we prove a functional central limit theorem for the properly rescaled trajectory. Our result characterizes the long-term fluctuations of the algorithm around the minimizer by providing a diffusion limit for the trajectory. In contrast with classical central limit theorems for the last iterate or Polyak-Ruppert averages, this functional result captures the temporal structure of the fluctuations and applies to non-smooth settings such as robust location estimation, including the geometric median.

Functional Central Limit Theorem for Stochastic Gradient Descent

TL;DR

, proving convergence to a diffusion process

with SDE

, where

and

. This yields Gaussian fluctuations of the trajectory and, in particular,

, providing a trajectory-level counterpart to classical CLTs and demonstrating applicability to non-smooth robust objectives like the geometric median. The results highlight a diffusion-based portrait of long-term SGD behavior under mild convexity assumptions, while noting limitations such as the need to know the local curvature to set

and that standard SGD variance may be larger than ERM benchmarks; future work includes extending to averaging schemes (e.g., Polyak-Ruppert) for asymptotic efficiency.

Abstract

Paper Structure (14 sections, 16 theorems, 71 equations, 1 figure)

This paper contains 14 sections, 16 theorems, 71 equations, 1 figure.

Introduction
Framework
Contributions
Related work
Main results
Conclusion
Intermediate lemmas
Proofs
Proof of Theorem \ref{['thm:tightness']}
Proof of Proposition \ref{['prop: sol EDS']}
Proof of Lemma \ref{['lem:cv process']}
Proof of Corollary \ref{['cor:CLT']}
Proof of Proposition \ref{['thm:bound_var_asymp']}
Proof of Theorem \ref{['thm:bound_sup_y']}

Key Result

Theorem 1

Let the sequence of step-sizes $(t_n)_{n\geq 1}$ satisfy $\sum_{n\geq 1}t_n=\infty$ and $\sum_{n\geq 1}t_n^2<\infty$. Let Assumptions assump:Phi-a, assump:quadratic_growth and assump:noise1 hold. Then, $\theta_n\xrightarrow[n\to\infty]{} \theta^*$ almost surely.

Figures (1)

Figure 1: Stochastic gradient descent trajectory for the estimation of the median of a Laplace$(0,1)$ distribution in $\mathbb{R}^2$ with independent coordinates, based on $n=50000$ samples. The first panel shows the full trajectory. The second panel zooms in on the fluctuations around the minimizer, with the first $2000$ iterations removed. The third panel shows the rescaled trajectory, illustrating the diffusion-like behavior predicted by Theorem \ref{['thm:main']}. Color indicates time, from light (start) to dark (end). We set the step size to $2/k$.

Theorems & Definitions (28)

Theorem 1
proof
Theorem 2
Proposition 1
Theorem 3
Remark 1
Proposition 2
proof
proof : Proof of Theorem \ref{['thm:main']}
Lemma 1
...and 18 more

Functional Central Limit Theorem for Stochastic Gradient Descent

TL;DR

Abstract

Functional Central Limit Theorem for Stochastic Gradient Descent

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (28)