Rigorous dynamical mean field theory for stochastic gradient descent methods
Cedric Gerbelot, Emanuele Troiani, Francesca Mignacco, Florent Krzakala, Lenka Zdeborova
TL;DR
This work addresses the exact high-dimensional behavior of first-order gradient-based methods such as SGD, Langevin dynamics, and momentum methods on Gaussian data. Using iterative Gaussian conditioning, it derives a discrete-time dynamical mean-field theory (DMFT) that expresses the dynamics through memory kernels and Gaussian processes with covariances, leading to self-consistent equations for all time steps. The main contributions include handling stochastic gradient noise, non-separable updates, and general data covariance, together with a numerically tractable solver for the DMFT equations and demonstrations on SGD variants. The results provide a principled framework to analyze convergence and stability of training dynamics in the high-dimensional regime, bridging statistical learning and dynamical mean-field theory.
Abstract
We prove closed-form equations for the exact high-dimensional asymptotics of a family of first order gradient-based methods, learning an estimator (e.g. M-estimator, shallow neural network, ...) from observations on Gaussian data with empirical risk minimization. This includes widely used algorithms such as stochastic gradient descent (SGD) or Nesterov acceleration. The obtained equations match those resulting from the discretization of dynamical mean-field theory (DMFT) equations from statistical physics when applied to gradient flow. Our proof method allows us to give an explicit description of how memory kernels build up in the effective dynamics, and to include non-separable update functions, allowing datasets with non-identity covariance matrices. Finally, we provide numerical implementations of the equations for SGD with generic extensive batch-size and with constant learning rates.
