Dyson Equation for Correlated Linearizations and Test Error of Random Features Regression

Hugo Latourelle-Vigeant; Elliot Paquette

Dyson Equation for Correlated Linearizations and Test Error of Random Features Regression

Hugo Latourelle-Vigeant, Elliot Paquette

TL;DR

This work develops a Dyson equation framework for correlated linearizations (DEL) to obtain deterministic equivalents for functionals of random matrices, with a focus on the empirical test error of random features ridge regression in high-dimensional proportional settings. It proves existence-uniqueness and stability of a matrix-valued solution M(z) to the DEL, and establishes an anisotropic global law that yields deterministic equivalents for pseudo-resolvents under general correlation structures. The framework is then applied to conditioned random features models (including deep random features), producing Gaussian equivalence results and new insights into implicit regularization and kernel connections. The results enable practical computation of the test error through scalar fixed-point iterations and provide evidence of accuracy on real datasets, while highlighting avenues for relaxing design assumptions and extending to broader activation functions. Overall, the paper bridges random matrix theory and learning theory to rigorously characterize test error in correlated, high-dimensional random feature models.

Abstract

This paper develops some theory of the Dyson equation for correlated linearizations and uses it to solve a problem on asymptotic deterministic equivalent for the test error in random features regression. The theory developed for the correlated Dyson equation includes existence-uniqueness, spectral support bounds, and stability properties. This theory is new for constructing deterministic equivalents for pseudo-resolvents of a class of linearizations with correlated entries. In the application, this theory is used to give a deterministic equivalent of the test error in random features ridge regression, in a proportional scaling regime, wherein we have conditioned on both training and test datasets.

Dyson Equation for Correlated Linearizations and Test Error of Random Features Regression

TL;DR

Abstract

Paper Structure (48 sections, 57 theorems, 181 equations, 2 figures)

This paper contains 48 sections, 57 theorems, 181 equations, 2 figures.

Introduction
Main Contributions
Notation
Organization
Dyson Equation for Correlated Linearizations
Assumptions for a General Anisotropic Global Law
Anisotropic Global Law
Empirical Test Error of Random Features Ridge Regression
Boundedness Assumptions
Bounded Denominator
Data Assumptions and Real-World Relevance
Extension to Deep Random Features
Gaussian Equivalence
Global Anisotropic Law for Rectangular Random Matrices
Implicit Regularization and Relation to Kernel Regression
...and 33 more sections

Key Result

Theorem 2.1

There exists a unique analytic matrix-valued function $M\in \mathcal{M}$ such that $M(z)$ solves the DEL eq:DEL for every $z\in \mathbb{H}$. Additionally, for all $z\in \mathbb{H}$, where $\Omega$ is a compactly supported matrix-valued measure on bounded Borel subsets of $\mathbb{R}$ satisfying

Figures (2)

Figure 1: $E_{\text{test}}$ vs the deterministic approximation given in \ref{['theorem:rf_error']} for various odd activation functions with different sizes of hidden layers $d$ and ridge parameter $\delta$. The data matrices, as well as the response variables, are sampled from a synthetic regression dataset, $n_{\text{train}}=n_{\text{test}}=n_{0}=1000$. Left: Error function activation ($\sigma(x)=\mathrm{erf}(x)$); Right: Sign activation ($\sigma(x)=\mathrm{sign}(x)$).
Figure 2: $E_{\text{test}}$ vs the deterministic approximation given in \ref{['theorem:rf_error']} for various flattened image classification datasets with different sizes of hidden layers $d$ and ridge parameter $\delta$. Sine activation ($\sigma=\sin$), $n_{\text{train}}=1500$, $n_{\text{test}}=1000$. Upper left: MNIST deng2012mnist; Upper right: Fashion-MNIST xiao2017fashionmnist; Lower left: CIFAR-10 Krizhevsky09learningmultiple; Lower right: CIFAR-100 Krizhevsky09learningmultiple.

Theorems & Definitions (103)

Theorem 2.1: Main Properties
Lemma 2.1
Theorem 2.2: Global Anisotropic Law for Pseudo-resolvents
Theorem 2.3
Lemma 4.1
proof
Corollary 4.1
Lemma 4.2
proof
Lemma 4.3
...and 93 more

Dyson Equation for Correlated Linearizations and Test Error of Random Features Regression

TL;DR

Abstract

Dyson Equation for Correlated Linearizations and Test Error of Random Features Regression

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (103)