Classical Statistical (In-Sample) Intuitions Don't Generalize Well: A Note on Bias-Variance Tradeoffs, Overfitting and Moving from Fixed to Random Designs

Alicia Curth

Classical Statistical (In-Sample) Intuitions Don't Generalize Well: A Note on Bias-Variance Tradeoffs, Overfitting and Moving from Fixed to Random Designs

Alicia Curth

TL;DR

It is highlighted that this simple move from fixed to random designs has (perhaps surprisingly) far-reaching consequences on textbook intuitions relating to the bias-variance tradeoff, and comment on the resulting (im)possibility of observing double descent and benign overfitting in fixed versus random designs.

Abstract

The sudden appearance of modern machine learning (ML) phenomena like double descent and benign overfitting may leave many classically trained statisticians feeling uneasy -- these phenomena appear to go against the very core of statistical intuitions conveyed in any introductory class on learning from data. The historical lack of earlier observation of such phenomena is usually attributed to today's reliance on more complex ML methods, overparameterization, interpolation and/or higher data dimensionality. In this note, we show that there is another reason why we observe behaviors today that appear at odds with intuitions taught in classical statistics textbooks, which is much simpler to understand yet rarely discussed explicitly. In particular, many intuitions originate in fixed design settings, in which in-sample prediction error (under resampling of noisy outcomes) is of interest, while modern ML evaluates its predictions in terms of generalization error, i.e. out-of-sample prediction error in random designs. Here, we highlight that this simple move from fixed to random designs has (perhaps surprisingly) far-reaching consequences on textbook intuitions relating to the bias-variance tradeoff, and comment on the resulting (im)possibility of observing double descent and benign overfitting in fixed versus random designs.

Classical Statistical (In-Sample) Intuitions Don't Generalize Well: A Note on Bias-Variance Tradeoffs, Overfitting and Moving from Fixed to Random Designs

TL;DR

Abstract

Paper Structure (17 sections, 8 equations, 5 figures)

This paper contains 17 sections, 8 equations, 5 figures.

Introduction
Problem setup: Fixed vs Random designs
How the move to random design settings affects the bias-variance tradeoff
The classical bias-variance tradeoff intuition: in-sample view
New territories: Bias-bias-variance tradeoffs in random design settings
Empirical investigation: How do bias terms evolve in- and out-of-sample?
Non-monotonic behavior of out-of-sample bias.
The effect of sampling noise.
Conclusion: The bias-variance tradeoff does not necessarily hold out-of-sample as it does in-sample
Reconciling double descent with textbook intuitions about the bias-variance tradeoff
Can overfitting be benign? Understanding overfitting requires refining our vocabulary
Can interpolation be benign in fixed design settings? (A: No!)
Can interpolation be benign in random design settings? (A: Yes, sometimes!)
Conclusion
Implications.
...and 2 more sections

Figures (5)

Figure 1: Stylized example: The 1-NN estimator does not necessarily have the lowest bias when considering test inputs different from training inputs.
Figure 2: The classical bias-variance tradeoff occurs in in-sample prediction error, but not in out-of-sample prediction error -- where decreasing $k$ can decrease both bias and variance. The behavior of Prediction error, Bias and Variance by k for kNN estimators, in-sample (orange) and out-of-sample (green). Data simulated using $f^*(x)$ from \ref{['eq:marsmult']} with $\sigma=5$.
Figure 3: Bias alone can cause the U-shape in out-of-sample prediction error (while in-sample the U-shape is caused by the bias-variance tradeoff and thus appears only when $\sigma>0$).The behavior of prediction error by k for kNN estimators, in-sample (orange) and out-of-sample (green) across different levels of noise in outcomes $\sigma$.
Figure 4: A double descent shape appears only in out-of-sample prediction error, not in in-sample prediction error.The behavior of in- and out-of-sample prediction error ($ERR_{is}$ and $ERR_{oos}$) as we vary the number of features $p$ included in a linear regression with $n=100$ training examples. In the underlying DGP, $\sigma=\frac{1}{2}$ and only the first $s=50$ features are used in $f^*$, all other $p-s$ features are irrelevant for prediction.
Figure 5: The bias due to lack of a perfect close neighbor match dominates the bias term out-of-sample. The behavior of the Squared NeighborMatchingBias and Squared AveragingBias by k for kNN estimators, in-sample (orange) and out-of-sample (green) for a nonlinear (left) and linear DGP (right)

Classical Statistical (In-Sample) Intuitions Don't Generalize Well: A Note on Bias-Variance Tradeoffs, Overfitting and Moving from Fixed to Random Designs

TL;DR

Abstract

Classical Statistical (In-Sample) Intuitions Don't Generalize Well: A Note on Bias-Variance Tradeoffs, Overfitting and Moving from Fixed to Random Designs

Authors

TL;DR

Abstract

Table of Contents

Figures (5)