Asymptotics of Learning with Deep Structured (Random) Features
Dominik Schröder, Daniil Dmitriev, Hugo Cui, Bruno Loureiro
TL;DR
The work delivers a rigorous, high‑dimensional analysis of the test error for learning the readout with deep structured random features, expressing the error in terms of population covariances of the feature maps. It develops anisotropic deterministic equivalents via random matrix theory, and provides a closed-form recursion for the feature covariances in Gaussian rainbow networks, enabling practical predictions for misspecified, deep, structured features. The results connect deep random feature models to trained networks by showing how linearizing the network and studying the effective linear features can capture the observed learning curves in lazy regimes and even align with some real-data trends when covariances are data-driven. This framework offers a principled way to quantify inductive biases and generalization in deep architectures with structured randomness, with potential applications to model selection and understanding gradient-descent dynamics in high dimensions.
Abstract
For a large class of feature maps we provide a tight asymptotic characterisation of the test error associated with learning the readout layer, in the high-dimensional limit where the input dimension, hidden layer widths, and number of training samples are proportionally large. This characterization is formulated in terms of the population covariance of the features. Our work is partially motivated by the problem of learning with Gaussian rainbow neural networks, namely deep non-linear fully-connected networks with random but structured weights, whose row-wise covariances are further allowed to depend on the weights of previous layers. For such networks we also derive a closed-form formula for the feature covariance in terms of the weight matrices. We further find that in some cases our results can capture feature maps learned by deep, finite-width neural networks trained under gradient descent.
