Two-Point Deterministic Equivalence for Stochastic Gradient Dynamics in Linear Models
Alexander Atanasov, Blake Bordelon, Jacob A. Zavatone-Veth, Courtney Paquette, Cengiz Pehlevan
TL;DR
This work develops a two-point deterministic equivalence for random-matrix resolvents and uses it to derive sharp, unified asymptotics for the training and generalization dynamics of high-dimensional linear models trained with SGD, including linear regression and linear random feature models. By integrating two-resolvent (two-point) analysis with S-transform techniques from free probability, the authors obtain explicit forcing and kernel terms, covariate-shift formulas, and dynamic/static limits, connecting to dynamical mean field theory. The results recover known asymptotics and produce novel predictions for finite data/width settings, and establish a principled, diagrammatic framework to analyze stochastic gradient dynamics in broad linear-model regimes with rich data and feature structures. Overall, the paper provides a rigorous, scalable toolkit for predicting SGD performance in high-dimensional regimes and reveals deep connections between random matrix theory and learning dynamics.
Abstract
We derive a novel deterministic equivalence for the two-point function of a random matrix resolvent. Using this result, we give a unified derivation of the performance of a wide variety of high-dimensional linear models trained with stochastic gradient descent. This includes high-dimensional linear regression, kernel regression, and linear random feature models. Our results include previously known asymptotics as well as novel ones.
