Revisiting Optimism and Model Complexity in the Wake of Overparameterized Machine Learning
Pratik Patil, Jin-Hong Du, Ryan J. Tibshirani
TL;DR
This work rethinks model complexity under overparameterization by introducing random-X degrees of freedom, bridging classical fixed-X ideas with modern predictive settings. It defines two random-X df notions—intrinsic and emergent—via random-X optimism and matches them to least-squares references to quantify complexity for arbitrary predictors, including interpolators. The authors develop theory for ridge, ridgeless, lasso, and convex regularized estimators, derive asymptotic equivalents under proportional regimes, and validate via extensive experiments across regression families and distribution shifts. A key insight is that emergent df typically exceeds intrinsic df, reflecting bias contributions, and that random-X df map to random-X prediction error through a universal mapping function. The framework enables decomposition of df into components due to bias and covariate shift, offering a tractable lens to study generalization in high-dimensional, non-smooth, and interpolating models with practical estimators like ridge, lasso, and random forests.
Abstract
Common practice in modern machine learning involves fitting a large number of parameters relative to the number of observations. These overparameterized models can exhibit surprising generalization behavior, e.g., ``double descent'' in the prediction error curve when plotted against the raw number of model parameters, or another simplistic notion of complexity. In this paper, we revisit model complexity from first principles, by first reinterpreting and then extending the classical statistical concept of (effective) degrees of freedom. Whereas the classical definition is connected to fixed-X prediction error (in which prediction error is defined by averaging over the same, nonrandom covariate points as those used during training), our extension of degrees of freedom is connected to random-X prediction error (in which prediction error is averaged over a new, random sample from the covariate distribution). The random-X setting more naturally embodies modern machine learning problems, where highly complex models, even those complex enough to interpolate the training data, can still lead to desirable generalization performance under appropriate conditions. We demonstrate the utility of our proposed complexity measures through a mix of conceptual arguments, theory, and experiments, and illustrate how they can be used to interpret and compare arbitrary prediction models.
