Table of Contents
Fetching ...

Classical Statistical (In-Sample) Intuitions Don't Generalize Well: A Note on Bias-Variance Tradeoffs, Overfitting and Moving from Fixed to Random Designs

Alicia Curth

TL;DR

It is highlighted that this simple move from fixed to random designs has (perhaps surprisingly) far-reaching consequences on textbook intuitions relating to the bias-variance tradeoff, and comment on the resulting (im)possibility of observing double descent and benign overfitting in fixed versus random designs.

Abstract

The sudden appearance of modern machine learning (ML) phenomena like double descent and benign overfitting may leave many classically trained statisticians feeling uneasy -- these phenomena appear to go against the very core of statistical intuitions conveyed in any introductory class on learning from data. The historical lack of earlier observation of such phenomena is usually attributed to today's reliance on more complex ML methods, overparameterization, interpolation and/or higher data dimensionality. In this note, we show that there is another reason why we observe behaviors today that appear at odds with intuitions taught in classical statistics textbooks, which is much simpler to understand yet rarely discussed explicitly. In particular, many intuitions originate in fixed design settings, in which in-sample prediction error (under resampling of noisy outcomes) is of interest, while modern ML evaluates its predictions in terms of generalization error, i.e. out-of-sample prediction error in random designs. Here, we highlight that this simple move from fixed to random designs has (perhaps surprisingly) far-reaching consequences on textbook intuitions relating to the bias-variance tradeoff, and comment on the resulting (im)possibility of observing double descent and benign overfitting in fixed versus random designs.

Classical Statistical (In-Sample) Intuitions Don't Generalize Well: A Note on Bias-Variance Tradeoffs, Overfitting and Moving from Fixed to Random Designs

TL;DR

It is highlighted that this simple move from fixed to random designs has (perhaps surprisingly) far-reaching consequences on textbook intuitions relating to the bias-variance tradeoff, and comment on the resulting (im)possibility of observing double descent and benign overfitting in fixed versus random designs.

Abstract

The sudden appearance of modern machine learning (ML) phenomena like double descent and benign overfitting may leave many classically trained statisticians feeling uneasy -- these phenomena appear to go against the very core of statistical intuitions conveyed in any introductory class on learning from data. The historical lack of earlier observation of such phenomena is usually attributed to today's reliance on more complex ML methods, overparameterization, interpolation and/or higher data dimensionality. In this note, we show that there is another reason why we observe behaviors today that appear at odds with intuitions taught in classical statistics textbooks, which is much simpler to understand yet rarely discussed explicitly. In particular, many intuitions originate in fixed design settings, in which in-sample prediction error (under resampling of noisy outcomes) is of interest, while modern ML evaluates its predictions in terms of generalization error, i.e. out-of-sample prediction error in random designs. Here, we highlight that this simple move from fixed to random designs has (perhaps surprisingly) far-reaching consequences on textbook intuitions relating to the bias-variance tradeoff, and comment on the resulting (im)possibility of observing double descent and benign overfitting in fixed versus random designs.
Paper Structure (17 sections, 8 equations, 5 figures)

This paper contains 17 sections, 8 equations, 5 figures.

Figures (5)

  • Figure 1: Stylized example: The 1-NN estimator does not necessarily have the lowest bias when considering test inputs different from training inputs.
  • Figure 2: The classical bias-variance tradeoff occurs in in-sample prediction error, but not in out-of-sample prediction error -- where decreasing $k$ can decrease both bias and variance. The behavior of Prediction error, Bias and Variance by k for kNN estimators, in-sample (orange) and out-of-sample (green). Data simulated using $f^*(x)$ from \ref{['eq:marsmult']} with $\sigma=5$.
  • Figure 3: Bias alone can cause the U-shape in out-of-sample prediction error (while in-sample the U-shape is caused by the bias-variance tradeoff and thus appears only when $\sigma>0$).The behavior of prediction error by k for kNN estimators, in-sample (orange) and out-of-sample (green) across different levels of noise in outcomes $\sigma$.
  • Figure 4: A double descent shape appears only in out-of-sample prediction error, not in in-sample prediction error.The behavior of in- and out-of-sample prediction error ($ERR_{is}$ and $ERR_{oos}$) as we vary the number of features $p$ included in a linear regression with $n=100$ training examples. In the underlying DGP, $\sigma=\frac{1}{2}$ and only the first $s=50$ features are used in $f^*$, all other $p-s$ features are irrelevant for prediction.
  • Figure 5: The bias due to lack of a perfect close neighbor match dominates the bias term out-of-sample. The behavior of the Squared NeighborMatchingBias and Squared AveragingBias by k for kNN estimators, in-sample (orange) and out-of-sample (green) for a nonlinear (left) and linear DGP (right)