Table of Contents
Fetching ...

From Cross-Validation to SURE: Asymptotic Risk of Tuned Regularized Estimators

Karun Adusumilli, Maximilian Kasy, Ashia Wilson

Abstract

We derive the asymptotic risk function of regularized empirical risk minimization (ERM) estimators tuned by $n$-fold cross-validation (CV). The out-of-sample prediction loss of such estimators converges in distribution to the squared-error loss (risk function) of shrinkage estimators in the normal means model, tuned by Stein's unbiased risk estimate (SURE). This risk function provides a more fine-grained picture of predictive performance than uniform bounds on worst-case regret, which are common in learning theory: it quantifies how risk varies with the true parameter. As key intermediate steps, we show that (i) $n$-fold CV converges uniformly to SURE, and (ii) while SURE typically has multiple local minima, its global minimum is generically well separated. Well-separation ensures that uniform convergence of CV to SURE translates into convergence of the tuning parameter chosen by CV to that chosen by SURE.

From Cross-Validation to SURE: Asymptotic Risk of Tuned Regularized Estimators

Abstract

We derive the asymptotic risk function of regularized empirical risk minimization (ERM) estimators tuned by -fold cross-validation (CV). The out-of-sample prediction loss of such estimators converges in distribution to the squared-error loss (risk function) of shrinkage estimators in the normal means model, tuned by Stein's unbiased risk estimate (SURE). This risk function provides a more fine-grained picture of predictive performance than uniform bounds on worst-case regret, which are common in learning theory: it quantifies how risk varies with the true parameter. As key intermediate steps, we show that (i) -fold CV converges uniformly to SURE, and (ii) while SURE typically has multiple local minima, its global minimum is generically well separated. Well-separation ensures that uniform convergence of CV to SURE translates into convergence of the tuning parameter chosen by CV to that chosen by SURE.
Paper Structure (36 sections, 12 theorems, 153 equations, 2 figures)

This paper contains 36 sections, 12 theorems, 153 equations, 2 figures.

Key Result

Theorem 1

Figures (2)

  • Figure 1: Risk function for JS-shrinkage, dimension 10
  • Figure 2: Examples of multi-modality of $SURE$

Theorems & Definitions (23)

  • Definition 1: Loss, empirical loss, and expected loss
  • Definition 2: Estimators of $\theta_0$
  • Definition 3: Estimators of risk
  • Definition 4: Tuned estimators of $\theta$
  • Definition 5: Risk functions
  • Theorem 1
  • Corollary 1
  • Lemma 1: Lipschitz $g^\lambda$
  • Lemma 2: Influence function approximation
  • Lemma 3: Limiting squared error loss
  • ...and 13 more