The $\varphi$ Curve: The Shape of Generalization through the Lens of Norm-based Capacity Control
Yichen Wang, Yudong Chen, Lorenzo Rosasco, Fanghui Liu
TL;DR
The paper reframes generalization through norm-based capacity rather than model size, using random features models to obtain precise learning-curve characterizations via deterministic equivalents. It establishes that the test risk can be captured by a norm-based quantity with a phase transition between under- and over-parameterization, and shows that double descent is not necessary under proper capacity measures. A linear relation between risk and norm emerges in the over-parameterized regime, while power-law settings yield explicit scaling laws, reinforcing that norm control—e.g., via regularization—shapes classical U-shaped generalization curves. These results offer a principled lens for understanding generalization in large, over-parameterized systems and provide new deterministic tools for broader applications, including scaling analyses and potential OOD studies.
Abstract
Understanding how the test risk scales with model complexity is a central question in machine learning. Classical theory is challenged by the learning curves observed for large over-parametrized deep networks. Capacity measures based on parameter count typically fail to account for these empirical observations. To tackle this challenge, we consider norm-based capacity measures and develop our study for random features based estimators, widely used as simplified theoretical models for more complex networks. In this context, we provide a precise characterization of how the estimator's norm concentrates and how it governs the associated test error. Our results show that the predicted learning curve admits a phase transition from under- to over-parameterization, but no double descent behavior. This confirms that more classical U-shaped behavior is recovered considering appropriate capacity measures based on models norms rather than size. From a technical point of view, we leverage deterministic equivalence as the key tool and further develop new deterministic quantities which are of independent interest.
