Limit Theorems for Stochastic Gradient Descent in High-Dimensional Single-Layer Networks
Parsa Rangriz
TL;DR
This work addresses the behavior of online stochastic gradient descent for high-dimensional, single-layer networks by developing a diffusion-limit framework with localizability and asymptotic closability. It identifies a critical step-size regime $\delta_N=1/N$ in which a correction term appears, causing the effective dynamics to deviate from the deterministic gradient-flow described by DMFT. The authors prove an ODE limit for the summary statistics $u_N=(m,r_⊥^2)$ with a population drift $\mathcal{F}$ and corrector $\mathcal{G}$, and, in a microlocal regime near fixed points, establish a limiting SDE that reduces to an Ornstein–Uhlenbeck process under suitable conditions. For activation functions with information exponent $k>2$, Gaussian initialization leads to $m(t)=0$ and a fixed-point radius $r_⊥^*$, illustrating the key role of stochastic fluctuations in high-dimensional learning and clarifying the limitations of deterministic ballistic scaling in capturing the full dynamics.
Abstract
This paper studies the high-dimensional scaling limits of online stochastic gradient descent (SGD) for single-layer networks. Building on the seminal work of Saad and Solla, which analyzed the deterministic (ballistic) scaling limits of SGD corresponding to the gradient flow of the population loss, we focus on the critical scaling regime of the step size. Below this critical scale, the effective dynamics are governed by ballistic (ODE) limits, but at the critical scale, new correction term appears that changes the phase diagram. In this regime, near the fixed points, the corresponding diffusive (SDE) limits of the effective dynamics reduces to an Ornstein-Uhlenbeck process under certain conditions. These results highlight how the information exponent controls sample complexity and illustrates the limitations of deterministic scaling limit in capturing the stochastic fluctuations of high-dimensional learning dynamics.
