Table of Contents
Fetching ...

Two-Point Deterministic Equivalence for Stochastic Gradient Dynamics in Linear Models

Alexander Atanasov, Blake Bordelon, Jacob A. Zavatone-Veth, Courtney Paquette, Cengiz Pehlevan

TL;DR

This work develops a two-point deterministic equivalence for random-matrix resolvents and uses it to derive sharp, unified asymptotics for the training and generalization dynamics of high-dimensional linear models trained with SGD, including linear regression and linear random feature models. By integrating two-resolvent (two-point) analysis with S-transform techniques from free probability, the authors obtain explicit forcing and kernel terms, covariate-shift formulas, and dynamic/static limits, connecting to dynamical mean field theory. The results recover known asymptotics and produce novel predictions for finite data/width settings, and establish a principled, diagrammatic framework to analyze stochastic gradient dynamics in broad linear-model regimes with rich data and feature structures. Overall, the paper provides a rigorous, scalable toolkit for predicting SGD performance in high-dimensional regimes and reveals deep connections between random matrix theory and learning dynamics.

Abstract

We derive a novel deterministic equivalence for the two-point function of a random matrix resolvent. Using this result, we give a unified derivation of the performance of a wide variety of high-dimensional linear models trained with stochastic gradient descent. This includes high-dimensional linear regression, kernel regression, and linear random feature models. Our results include previously known asymptotics as well as novel ones.

Two-Point Deterministic Equivalence for Stochastic Gradient Dynamics in Linear Models

TL;DR

This work develops a two-point deterministic equivalence for random-matrix resolvents and uses it to derive sharp, unified asymptotics for the training and generalization dynamics of high-dimensional linear models trained with SGD, including linear regression and linear random feature models. By integrating two-resolvent (two-point) analysis with S-transform techniques from free probability, the authors obtain explicit forcing and kernel terms, covariate-shift formulas, and dynamic/static limits, connecting to dynamical mean field theory. The results recover known asymptotics and produce novel predictions for finite data/width settings, and establish a principled, diagrammatic framework to analyze stochastic gradient dynamics in broad linear-model regimes with rich data and feature structures. Overall, the paper provides a rigorous, scalable toolkit for predicting SGD performance in high-dimensional regimes and reveals deep connections between random matrix theory and learning dynamics.

Abstract

We derive a novel deterministic equivalence for the two-point function of a random matrix resolvent. Using this result, we give a unified derivation of the performance of a wide variety of high-dimensional linear models trained with stochastic gradient descent. This includes high-dimensional linear regression, kernel regression, and linear random feature models. Our results include previously known asymptotics as well as novel ones.

Paper Structure

This paper contains 36 sections, 119 equations.