Random features and polynomial rules
Fabián Aguirre-López, Silvio Franz, Mauro Pastore
TL;DR
This work analyzes the generalization of random features models (RFMs) with Gaussian inputs by mapping the RFM to an equivalent polynomial model via a Hermite expansion of the activation. Using replica methods, the authors derive replica-symmetric saddle-point equations that reveal how learning progresses in hierarchical feature orders as $N \sim D^L$ and $P \sim D^K$, producing staircase generalization curves and an interpolation peak when $N \approx P$. The key insight is that the RFM behaves as a high-rank kernel machine for $P \ll N$ and as a degree-$L$ polynomial student for $P, N$ scaling with $D$, with higher-order Hermite components acting as noise that drives the interpolation peak. The authors validate their analytic predictions with numerical experiments across broad parameter regimes and provide a finite-size effective theory that captures realistic network sizes. These results deepen understanding of feature learning and generalization in RFMs beyond traditional proportional-scaling limits, with implications for kernel methods and lazy training regimes.
Abstract
Random features models play a distinguished role in the theory of deep learning, describing the behavior of neural networks close to their infinite-width limit. In this work, we present a thorough analysis of the generalization performance of random features models for generic supervised learning problems with Gaussian data. Our approach, built with tools from the statistical mechanics of disordered systems, maps the random features model to an equivalent polynomial model, and allows us to plot average generalization curves as functions of the two main control parameters of the problem: the number of random features $N$ and the size $P$ of the training set, both assumed to scale as powers in the input dimension $D$. Our results extend the case of proportional scaling between $N$, $P$ and $D$. They are in accordance with rigorous bounds known for certain particular learning tasks and are in quantitative agreement with numerical experiments performed over many order of magnitudes of $N$ and $P$. We find good agreement also far from the asymptotic limits where $D\to \infty$ and at least one between $P/D^K$, $N/D^L$ remains finite.
