Table of Contents
Fetching ...

Dyson Equation for Correlated Linearizations and Test Error of Random Features Regression

Hugo Latourelle-Vigeant, Elliot Paquette

TL;DR

This work develops a Dyson equation framework for correlated linearizations (DEL) to obtain deterministic equivalents for functionals of random matrices, with a focus on the empirical test error of random features ridge regression in high-dimensional proportional settings. It proves existence-uniqueness and stability of a matrix-valued solution M(z) to the DEL, and establishes an anisotropic global law that yields deterministic equivalents for pseudo-resolvents under general correlation structures. The framework is then applied to conditioned random features models (including deep random features), producing Gaussian equivalence results and new insights into implicit regularization and kernel connections. The results enable practical computation of the test error through scalar fixed-point iterations and provide evidence of accuracy on real datasets, while highlighting avenues for relaxing design assumptions and extending to broader activation functions. Overall, the paper bridges random matrix theory and learning theory to rigorously characterize test error in correlated, high-dimensional random feature models.

Abstract

This paper develops some theory of the Dyson equation for correlated linearizations and uses it to solve a problem on asymptotic deterministic equivalent for the test error in random features regression. The theory developed for the correlated Dyson equation includes existence-uniqueness, spectral support bounds, and stability properties. This theory is new for constructing deterministic equivalents for pseudo-resolvents of a class of linearizations with correlated entries. In the application, this theory is used to give a deterministic equivalent of the test error in random features ridge regression, in a proportional scaling regime, wherein we have conditioned on both training and test datasets.

Dyson Equation for Correlated Linearizations and Test Error of Random Features Regression

TL;DR

This work develops a Dyson equation framework for correlated linearizations (DEL) to obtain deterministic equivalents for functionals of random matrices, with a focus on the empirical test error of random features ridge regression in high-dimensional proportional settings. It proves existence-uniqueness and stability of a matrix-valued solution M(z) to the DEL, and establishes an anisotropic global law that yields deterministic equivalents for pseudo-resolvents under general correlation structures. The framework is then applied to conditioned random features models (including deep random features), producing Gaussian equivalence results and new insights into implicit regularization and kernel connections. The results enable practical computation of the test error through scalar fixed-point iterations and provide evidence of accuracy on real datasets, while highlighting avenues for relaxing design assumptions and extending to broader activation functions. Overall, the paper bridges random matrix theory and learning theory to rigorously characterize test error in correlated, high-dimensional random feature models.

Abstract

This paper develops some theory of the Dyson equation for correlated linearizations and uses it to solve a problem on asymptotic deterministic equivalent for the test error in random features regression. The theory developed for the correlated Dyson equation includes existence-uniqueness, spectral support bounds, and stability properties. This theory is new for constructing deterministic equivalents for pseudo-resolvents of a class of linearizations with correlated entries. In the application, this theory is used to give a deterministic equivalent of the test error in random features ridge regression, in a proportional scaling regime, wherein we have conditioned on both training and test datasets.
Paper Structure (48 sections, 57 theorems, 181 equations, 2 figures)

This paper contains 48 sections, 57 theorems, 181 equations, 2 figures.

Key Result

Theorem 2.1

There exists a unique analytic matrix-valued function $M\in \mathcal{M}$ such that $M(z)$ solves the DEL eq:DEL for every $z\in \mathbb{H}$. Additionally, for all $z\in \mathbb{H}$, where $\Omega$ is a compactly supported matrix-valued measure on bounded Borel subsets of $\mathbb{R}$ satisfying

Figures (2)

  • Figure 1: $E_{\text{test}}$ vs the deterministic approximation given in \ref{['theorem:rf_error']} for various odd activation functions with different sizes of hidden layers $d$ and ridge parameter $\delta$. The data matrices, as well as the response variables, are sampled from a synthetic regression dataset, $n_{\text{train}}=n_{\text{test}}=n_{0}=1000$. Left: Error function activation ($\sigma(x)=\mathrm{erf}(x)$); Right: Sign activation ($\sigma(x)=\mathrm{sign}(x)$).
  • Figure 2: $E_{\text{test}}$ vs the deterministic approximation given in \ref{['theorem:rf_error']} for various flattened image classification datasets with different sizes of hidden layers $d$ and ridge parameter $\delta$. Sine activation ($\sigma=\sin$), $n_{\text{train}}=1500$, $n_{\text{test}}=1000$. Upper left: MNIST deng2012mnist; Upper right: Fashion-MNIST xiao2017fashionmnist; Lower left: CIFAR-10 Krizhevsky09learningmultiple; Lower right: CIFAR-100 Krizhevsky09learningmultiple.

Theorems & Definitions (103)

  • Theorem 2.1: Main Properties
  • Lemma 2.1
  • Theorem 2.2: Global Anisotropic Law for Pseudo-resolvents
  • Theorem 2.3
  • Lemma 4.1
  • proof
  • Corollary 4.1
  • Lemma 4.2
  • proof
  • Lemma 4.3
  • ...and 93 more