The Nuclear Route: Sharp Asymptotics of ERM in Overparameterized Quadratic Networks
Vittorio Erba, Emanuele Troiani, Lenka Zdeborová, Florent Krzakala
TL;DR
The paper analyzes empirical risk minimization for over-parameterized two-layer neural networks with quadratic activations trained on Gaussian data, revealing that L2 regularization induces a nuclear-norm penalty in an equivalent PSD matrix-estimation problem. Using approximate message passing and Gaussian universality, it derives sharp closed-form limits for training/test errors and the spectrum of the learned weights in the high-dimensional regime, showing that learnability depends on the target’s spectral width κ^* and the extent of over-parameterization κ. Key contributions include exact interpolation and strong-recovery thresholds, a detailed learning-curve description, and a characterization of how over-parameterization can preserve performance even when the width greatly exceeds data requirements. The findings illuminate the deep connection between low-rank matrix sensing and non-linear learning in quadratic networks, bridging spin-glass intuition, convex optimization, and matrix factorization theory with precise asymptotic results.
Abstract
We study the high-dimensional asymptotics of empirical risk minimization (ERM) in over-parametrized two-layer neural networks with quadratic activations trained on synthetic data. We derive sharp asymptotics for both training and test errors by mapping the $\ell_2$-regularized learning problem to a convex matrix sensing task with nuclear norm penalization. This reveals that capacity control in such networks emerges from a low-rank structure in the learned feature maps. Our results characterize the global minima of the loss and yield precise generalization thresholds, showing how the width of the target function governs learnability. This analysis bridges and extends ideas from spin-glass methods, matrix factorization, and convex optimization and emphasizes the deep link between low-rank matrix sensing and learning in quadratic neural networks.
