Table of Contents
Fetching ...

A Honest Cross-Validation Estimator for Prediction Performance

Tianyu Pan, Vincent Z. Yu, Viswanath Devanarayan, Lu Tian

TL;DR

The paper tackles the problem that standard cross-validation estimates the average performance of a family of models rather than a single model trained on a fixed dataset. It proposes a random-effects framework where each split yields a true performance $Err_k$, with estimators $\widehat{Err}_k$ from held-out data, and develops both a hierarchical Bayesian and an empirical Bayes approach to estimate the performance $Err_0$ of the model trained on a specific training set. Through simulations on continuous MSPE and binary c-index tasks, plus a bike-sharing example, the authors show that the proposed estimators either match or outperform conventional cross-validation and naive single-split estimates, especially when the test set size is small or when there is substantial variation across splits. The work enables researchers to report a model with a corresponding, principled estimate of its predictive performance, and suggests extensions to non-Gaussian random effects and bootstrap-based uncertainty assessment.

Abstract

Cross-validation is a standard tool for obtaining a honest assessment of the performance of a prediction model. The commonly used version repeatedly splits data, trains the prediction model on the training set, evaluates the model performance on the test set, and averages the model performance across different data splits. A well-known criticism is that such cross-validation procedure does not directly estimate the performance of the particular model recommended for future use. In this paper, we propose a new method to estimate the performance of a model trained on a specific (random) training set. A naive estimator can be obtained by applying the model to a disjoint testing set. Surprisingly, cross-validation estimators computed from other random splits can be used to improve this naive estimator within a random-effects model framework. We develop two estimators -- a hierarchical Bayesian estimator and an empirical Bayes estimator -- that perform similarly to or better than both the conventional cross-validation estimator and the naive single-split estimator. Simulations and a real-data example demonstrate the superior performance of the proposed method.

A Honest Cross-Validation Estimator for Prediction Performance

TL;DR

The paper tackles the problem that standard cross-validation estimates the average performance of a family of models rather than a single model trained on a fixed dataset. It proposes a random-effects framework where each split yields a true performance , with estimators from held-out data, and develops both a hierarchical Bayesian and an empirical Bayes approach to estimate the performance of the model trained on a specific training set. Through simulations on continuous MSPE and binary c-index tasks, plus a bike-sharing example, the authors show that the proposed estimators either match or outperform conventional cross-validation and naive single-split estimates, especially when the test set size is small or when there is substantial variation across splits. The work enables researchers to report a model with a corresponding, principled estimate of its predictive performance, and suggests extensions to non-Gaussian random effects and bootstrap-based uncertainty assessment.

Abstract

Cross-validation is a standard tool for obtaining a honest assessment of the performance of a prediction model. The commonly used version repeatedly splits data, trains the prediction model on the training set, evaluates the model performance on the test set, and averages the model performance across different data splits. A well-known criticism is that such cross-validation procedure does not directly estimate the performance of the particular model recommended for future use. In this paper, we propose a new method to estimate the performance of a model trained on a specific (random) training set. A naive estimator can be obtained by applying the model to a disjoint testing set. Surprisingly, cross-validation estimators computed from other random splits can be used to improve this naive estimator within a random-effects model framework. We develop two estimators -- a hierarchical Bayesian estimator and an empirical Bayes estimator -- that perform similarly to or better than both the conventional cross-validation estimator and the naive single-split estimator. Simulations and a real-data example demonstrate the superior performance of the proposed method.

Paper Structure

This paper contains 11 sections, 24 equations, 4 figures.

Figures (4)

  • Figure 1: Mean absolute estimation error of $\widehat{\mathrm{Err}}_0$, $\widehat{\mathrm{Err}}_0^{CV}$, $\widehat{\mathrm{Err}}_0^{EB}$, and $\widehat{\mathrm{Err}}_0^{B}$ for varying test set sizes. Here $\mathrm{Err}_0$ is the mean squared prediction error for a continuous outcome.
  • Figure 2: Mean absolute estimation error of $\widehat{\mathrm{Err}}_0$, $\widehat{\mathrm{Err}}_0^{CV}$, $\widehat{\mathrm{Err}}_0^{EB}$, and $\widehat{\mathrm{Err}}_0^{B}$ for varying test set sizes. Here $\mathrm{Err}_0$ is the area under the Receiver's operating characteristics curve (AUC) for a binary outcome.
  • Figure 3: Mean squared prediction error of a random forest for hourly rental counts with the sample size in training sets varying from 50 to 200. The dotted line is the mean squared prediction error of predicting the hourly counts by their sample mean, a constant.
  • Figure 4: Mean absolute estimation error of $\widehat{\mathrm{Err}}_0$, $\widehat{\mathrm{Err}}_0^{CV}$, $\widehat{\mathrm{Err}}_0^{EB}$, and $\widehat{\mathrm{Err}}_0^{B}$ for varying validation sizes in the bike-sharing data. Here $\mathrm{Err}_0$ is the mean squared prediction error of a random forest for hourly rental counts.

Theorems & Definitions (3)

  • Remark 1
  • Remark 2
  • Remark 3