Table of Contents
Fetching ...

Input-Label Correlation Governs a Linear-to-Nonlinear Transition in Random Features under Spiked Covariance

Samet Demir, Zafer Dogan

TL;DR

A universality principle under anisotropy is established and the RFM generalization error is characterized via an equivalent noisy polynomial model, governed by the strength of input-label correlation, yielding an explicit boundary in the correlation-spike-magnitude plane.

Abstract

Random feature models (RFMs), two-layer networks with a randomly initialized fixed first layer and a trained linear readout, are among the simplest nonlinear predictors. Prior asymptotic analyses in the proportional high-dimensional regime show that, under isotropic data, RFMs reduce to noisy linear models and offer no advantage over classical linear methods such as ridge regression. Yet RFMs frequently outperform linear baselines on structured real data. We show that this tension is explained by a correlation-driven phase transition: under spiked-covariance designs, the interaction between anisotropy and input-label correlation determines whether the RFM behaves as an effectively linear predictor or exhibits genuinely nonlinear gains. Concretely, we establish a universality principle under anisotropy and characterize the RFM generalization error via an equivalent noisy polynomial model. The effective degree of this polynomial, equivalently, which Hermite orders of the activation survive, is governed by the strength of input-label correlation, yielding an explicit boundary in the correlation-spike-magnitude plane. Below the boundary, the RFM collapses to a linear surrogate and can underperform strong linear baselines; above it, higher-order terms persist and the RFM achieves a clear nonlinear advantage. Numerical simulations and real-data experiments corroborate the theory and delineate the transition between these two regimes.

Input-Label Correlation Governs a Linear-to-Nonlinear Transition in Random Features under Spiked Covariance

TL;DR

A universality principle under anisotropy is established and the RFM generalization error is characterized via an equivalent noisy polynomial model, governed by the strength of input-label correlation, yielding an explicit boundary in the correlation-spike-magnitude plane.

Abstract

Random feature models (RFMs), two-layer networks with a randomly initialized fixed first layer and a trained linear readout, are among the simplest nonlinear predictors. Prior asymptotic analyses in the proportional high-dimensional regime show that, under isotropic data, RFMs reduce to noisy linear models and offer no advantage over classical linear methods such as ridge regression. Yet RFMs frequently outperform linear baselines on structured real data. We show that this tension is explained by a correlation-driven phase transition: under spiked-covariance designs, the interaction between anisotropy and input-label correlation determines whether the RFM behaves as an effectively linear predictor or exhibits genuinely nonlinear gains. Concretely, we establish a universality principle under anisotropy and characterize the RFM generalization error via an equivalent noisy polynomial model. The effective degree of this polynomial, equivalently, which Hermite orders of the activation survive, is governed by the strength of input-label correlation, yielding an explicit boundary in the correlation-spike-magnitude plane. Below the boundary, the RFM collapses to a linear surrogate and can underperform strong linear baselines; above it, higher-order terms persist and the RFM achieves a clear nonlinear advantage. Numerical simulations and real-data experiments corroborate the theory and delineate the transition between these two regimes.
Paper Structure (37 sections, 25 theorems, 129 equations, 7 figures)

This paper contains 37 sections, 25 theorems, 129 equations, 7 figures.

Key Result

Theorem 1

Let $\sigma(x)$, $\hat{\sigma}(x)$ be two activation functions satisfying S.6. Suppose that assumptions S.1-S.6 given in Section sec:assumptions hold. If are satisfied for some $\varsigma >0$, then

Figures (7)

  • Figure 1: The phase boundary in the $(\alpha,\theta)$ plane. Left: the red curve separates the effectively linear regime (below) from the genuinely nonlinear regime (above). The red curve marks the phase boundary predicted by Corollary \ref{['cor:linear']}. Right: when $\alpha = \mathcal{O}(1/\sqrt{n})$, the RFM lies in the linear regime and matches the noisy linear model across activations. Here $\sigma_*=\sigma_{ReLU}$, $\lambda=10^{-2}$, $n=400$, $m=500$; averages over 50 Monte Carlo runs. Note that we provide our simulation results for the nonlinear regime in Figure \ref{['fig:polynomial_equivalence']}.
  • Figure 2: The nonlinear side of the phase boundary. When $\alpha=1$ and $\theta = n^{1/2}$, the RFM crosses the phase boundary and the noisy linear model (\ref{['eq:gaussian_model']}) no longer suffices. The noisy polynomial model (\ref{['eq:polynomial_model']}) captures the RFM generalization error accurately. Here $\lambda=10^{-3}$, $n=400$, $m=500$; averages over 50 Monte Carlo runs.
  • Figure 3: Comparison of activation functions - generalization errors of the RFM with different nonlinearities (linear, polynomial, Softplus, ReLU) with respect to number of samples (on the left), alignment (on the center), and spike magnitude $\theta \asymp n^\beta$ (on the right). Here, $n=400$, $k=500$, $\lambda=10^{-2}$, and $\sigma_*=\sigma_{ReLU}$. We limit the degree of the equivalent polynomial model (\ref{['eq:equivalent_activation']}) to a maximum of $l=4$ for numerical stability. We plot the average of 50 Monte Carlo runs.
  • Figure 4: The phase transition on real data: CIFAR-10 classification (airplane vs. automobile). Input--label correlation is controlled by interpolating between true and random labels; the highest correlation norm uses true labels, the lowest uses random labels. As correlation increases, the RFM separates from the linear model, consistent with the predicted phase transition. Adding Gaussian noise (right) isolates the correlation effect. Here $l = 5$, $\lambda = 10^{-1}$, $k=n=3072$, $m=4000$; averages over 50 Monte Carlo runs. Details in Supplementary \ref{['supplement:CIFAR-10']}.
  • Figure SM1: Training errors for the misaligned case (the setting in Figure \ref{['fig:linear_equivalence']})
  • ...and 2 more figures

Theorems & Definitions (49)

  • Theorem 1
  • proof
  • Theorem 2
  • proof
  • Corollary 3
  • Remark 4
  • Remark 5
  • Lemma 6
  • Lemma 7
  • proof
  • ...and 39 more