Stochastic Gradient Flow Dynamics of Test Risk and its Exact Solution for Weak Features
Rodrigo Veiga, Anastasia Remizova, Nicolas Macris
TL;DR
This work analyzes the test risk dynamics of stochastic gradient flow (SGF) in the small-learning-rate regime by recasting SGD as a continuous-time Itô process and deploying a path-integral (Laplace) approximation. A general covariance formula for fluctuations around the pure gradient flow trajectory is derived, enabling an explicit comparison between GF and SGF test risks. The theory is then applied to a weak-features regression model that exhibits double descent, yielding closed-form expressions in terms of Marchenko–Pastur spectra and time integrals; the SGF corrections correctly predict deviations from GF seen in SGD simulations. Overall, the paper provides a tractable, analytically controlled framework to quantify how stochasticity in SGD reshapes generalization dynamics over the entire training horizon, with potential extensions to more complex models and activation schemes. The results offer a principled basis for understanding stochastic effects on double-descent curves and the time-dependent generalization behavior in overparameterized settings.
Abstract
We investigate the test risk of continuous-time stochastic gradient flow dynamics in learning theory. Using a path integral formulation we provide, in the regime of a small learning rate, a general formula for computing the difference between test risk curves of pure gradient and stochastic gradient flows. We apply the general theory to a simple model of weak features, which displays the double descent phenomenon, and explicitly compute the corrections brought about by the added stochastic term in the dynamics, as a function of time and model parameters. The analytical results are compared to simulations of discrete-time stochastic gradient descent and show good agreement.
