High-Dimensional Limit of Stochastic Gradient Flow via Dynamical Mean-Field Theory
Sota Nishiyama, Masaaki Imaizumi
TL;DR
This work develops a dynamical mean-field theory (DMFT) for the high-dimensional stochastic gradient flow (SGF), a continuous-time surrogate for multi-pass stochastic gradient descent with small batches. By analyzing SGF in the proportional limit $n,d\to\infty$ with $n/d\to\delta$, the authors derive a low-dimensional, self-consistent DMFT system that characterizes the asymptotic distribution of SGF parameters and predictions. They establish existence/uniqueness of the DMFT solution, prove convergence of SGF's empirical distribution to this DMFT law, and show how DMFT reduces to existing high-dimensional SGD descriptions in online and linear-regression limits. Numerical experiments on logistic regression demonstrate strong agreement between SGF-DMFT predictions and actual SGD dynamics, supporting the framework's relevance for nonlinear models and planted-signal settings. Overall, the paper offers a unifying, high-dimensional perspective on SGD dynamics, connecting online SGD, linear models, and nonlinear architectures through a tractable DMFT formalism.
Abstract
Modern machine learning models are typically trained via multi-pass stochastic gradient descent (SGD) with small batch sizes, and understanding their dynamics in high dimensions is of great interest. However, an analytical framework for describing the high-dimensional asymptotic behavior of multi-pass SGD with small batch sizes for nonlinear models is currently missing. In this study, we address this gap by analyzing the high-dimensional dynamics of a stochastic differential equation called a \emph{stochastic gradient flow} (SGF), which approximates multi-pass SGD in this regime. In the limit where the number of data samples $n$ and the dimension $d$ grow proportionally, we derive a closed system of low-dimensional and continuous-time equations and prove that it characterizes the asymptotic distribution of the SGF parameters. Our theory is based on the dynamical mean-field theory (DMFT) and is applicable to a wide range of models encompassing generalized linear models and two-layer neural networks. We further show that the resulting DMFT equations recover several existing high-dimensional descriptions of SGD dynamics as special cases, thereby providing a unifying perspective on prior frameworks such as online SGD and high-dimensional linear regression. Our proof builds on the existing DMFT technique for gradient flow and extends it to handle the stochasticity in SGF using tools from stochastic calculus.
