Table of Contents
Fetching ...

High-Dimensional Limit of Stochastic Gradient Flow via Dynamical Mean-Field Theory

Sota Nishiyama, Masaaki Imaizumi

TL;DR

This work develops a dynamical mean-field theory (DMFT) for the high-dimensional stochastic gradient flow (SGF), a continuous-time surrogate for multi-pass stochastic gradient descent with small batches. By analyzing SGF in the proportional limit $n,d\to\infty$ with $n/d\to\delta$, the authors derive a low-dimensional, self-consistent DMFT system that characterizes the asymptotic distribution of SGF parameters and predictions. They establish existence/uniqueness of the DMFT solution, prove convergence of SGF's empirical distribution to this DMFT law, and show how DMFT reduces to existing high-dimensional SGD descriptions in online and linear-regression limits. Numerical experiments on logistic regression demonstrate strong agreement between SGF-DMFT predictions and actual SGD dynamics, supporting the framework's relevance for nonlinear models and planted-signal settings. Overall, the paper offers a unifying, high-dimensional perspective on SGD dynamics, connecting online SGD, linear models, and nonlinear architectures through a tractable DMFT formalism.

Abstract

Modern machine learning models are typically trained via multi-pass stochastic gradient descent (SGD) with small batch sizes, and understanding their dynamics in high dimensions is of great interest. However, an analytical framework for describing the high-dimensional asymptotic behavior of multi-pass SGD with small batch sizes for nonlinear models is currently missing. In this study, we address this gap by analyzing the high-dimensional dynamics of a stochastic differential equation called a \emph{stochastic gradient flow} (SGF), which approximates multi-pass SGD in this regime. In the limit where the number of data samples $n$ and the dimension $d$ grow proportionally, we derive a closed system of low-dimensional and continuous-time equations and prove that it characterizes the asymptotic distribution of the SGF parameters. Our theory is based on the dynamical mean-field theory (DMFT) and is applicable to a wide range of models encompassing generalized linear models and two-layer neural networks. We further show that the resulting DMFT equations recover several existing high-dimensional descriptions of SGD dynamics as special cases, thereby providing a unifying perspective on prior frameworks such as online SGD and high-dimensional linear regression. Our proof builds on the existing DMFT technique for gradient flow and extends it to handle the stochasticity in SGF using tools from stochastic calculus.

High-Dimensional Limit of Stochastic Gradient Flow via Dynamical Mean-Field Theory

TL;DR

This work develops a dynamical mean-field theory (DMFT) for the high-dimensional stochastic gradient flow (SGF), a continuous-time surrogate for multi-pass stochastic gradient descent with small batches. By analyzing SGF in the proportional limit with , the authors derive a low-dimensional, self-consistent DMFT system that characterizes the asymptotic distribution of SGF parameters and predictions. They establish existence/uniqueness of the DMFT solution, prove convergence of SGF's empirical distribution to this DMFT law, and show how DMFT reduces to existing high-dimensional SGD descriptions in online and linear-regression limits. Numerical experiments on logistic regression demonstrate strong agreement between SGF-DMFT predictions and actual SGD dynamics, supporting the framework's relevance for nonlinear models and planted-signal settings. Overall, the paper offers a unifying, high-dimensional perspective on SGD dynamics, connecting online SGD, linear models, and nonlinear architectures through a tractable DMFT formalism.

Abstract

Modern machine learning models are typically trained via multi-pass stochastic gradient descent (SGD) with small batch sizes, and understanding their dynamics in high dimensions is of great interest. However, an analytical framework for describing the high-dimensional asymptotic behavior of multi-pass SGD with small batch sizes for nonlinear models is currently missing. In this study, we address this gap by analyzing the high-dimensional dynamics of a stochastic differential equation called a \emph{stochastic gradient flow} (SGF), which approximates multi-pass SGD in this regime. In the limit where the number of data samples and the dimension grow proportionally, we derive a closed system of low-dimensional and continuous-time equations and prove that it characterizes the asymptotic distribution of the SGF parameters. Our theory is based on the dynamical mean-field theory (DMFT) and is applicable to a wide range of models encompassing generalized linear models and two-layer neural networks. We further show that the resulting DMFT equations recover several existing high-dimensional descriptions of SGD dynamics as special cases, thereby providing a unifying perspective on prior frameworks such as online SGD and high-dimensional linear regression. Our proof builds on the existing DMFT technique for gradient flow and extends it to handle the stochasticity in SGF using tools from stochastic calculus.
Paper Structure (82 sections, 21 theorems, 328 equations, 2 figures, 1 table)

This paper contains 82 sections, 21 theorems, 328 equations, 2 figures, 1 table.

Key Result

Theorem 1

Suppose Assumptions ass:data and ass:func hold. Then, there exists some $T_* > 0$ such that for any $T \in [0, T_*]$, the DMFT equation $\mathfrak{S}$ admits a unique bounded fixed point $(C_\theta, \Sigma_\ell, R_\theta, R_\ell, \Gamma)$ on the interval $[0,T]$. Moreover, the stochastic processes $

Figures (2)

  • Figure 1: A logical roadmap of our framework. We model the SGD dynamics with the stochastic gradient flow (SGF), and analyze the SGF in high dimensions. We show that the empirical distribution of SGF parameters converges to the DMFT solution (Theorem \ref{['thm:dmft_sgf']}), which uniquely exists (Theorem \ref{['thm:dmft_sol']}).
  • Figure 2: Train (left) and test (right) error dynamics of SGD for logistic regression with various temperature values $\tau = \eta / B$. (Solid) Average errors of $10$ trials of SGD with $d=1024$ and $n=2048$. Shaded regions represent one standard deviation. (Dotted) Predictions from the DMFT equation.

Theorems & Definitions (33)

  • Theorem 1: Existence and uniqueness of the DMFT equation
  • Theorem 2: DMFT characterization of SGF
  • Corollary 3: DMFT characterization of SGF with a planted signal
  • Proposition 4.1: DMFT characterization of SGD for linear regression
  • Definition C.1: Admissible space $\mathcal{S}_\theta(T)$
  • Definition C.2: Admissible space $\mathcal{S}_\ell(T)$
  • Lemma C.3
  • proof
  • Lemma C.4
  • Lemma C.5
  • ...and 23 more