Functional Central Limit Theorem for Stochastic Gradient Descent
Kessang Flamand, Victor-Emmanuel Brunel
TL;DR
This work addresses understanding the full asymptotic behavior of SGD trajectories for convex objectives, not merely the endpoint. It introduces a functional central limit theorem by analyzing a rescaled SGD path with step size $t_n = \delta/n$, proving convergence to a diffusion process $\{Y_t\}$ on $(0,\infty)$ with SDE $dY_t = -t^{-1}H Y_t dt + \Sigma^{1/2} dB_t$, where $H = \delta\nabla^2\Phi(\theta^*) - I_d$ and $\Sigma = \delta^2\Gamma$. This yields Gaussian fluctuations of the trajectory and, in particular, $\sqrt{n}(\hat{\theta}_n - \theta^*) \Rightarrow N(0,\Sigma)$, providing a trajectory-level counterpart to classical CLTs and demonstrating applicability to non-smooth robust objectives like the geometric median. The results highlight a diffusion-based portrait of long-term SGD behavior under mild convexity assumptions, while noting limitations such as the need to know the local curvature to set $\delta$ and that standard SGD variance may be larger than ERM benchmarks; future work includes extending to averaging schemes (e.g., Polyak-Ruppert) for asymptotic efficiency.
Abstract
We study the asymptotic shape of the trajectory of the stochastic gradient descent algorithm applied to a convex objective function. Under mild regularity assumptions, we prove a functional central limit theorem for the properly rescaled trajectory. Our result characterizes the long-term fluctuations of the algorithm around the minimizer by providing a diffusion limit for the trajectory. In contrast with classical central limit theorems for the last iterate or Polyak-Ruppert averages, this functional result captures the temporal structure of the fluctuations and applies to non-smooth settings such as robust location estimation, including the geometric median.
