From Cross-Validation to SURE: Asymptotic Risk of Tuned Regularized Estimators

Karun Adusumilli; Maximilian Kasy; Ashia Wilson

From Cross-Validation to SURE: Asymptotic Risk of Tuned Regularized Estimators

Karun Adusumilli, Maximilian Kasy, Ashia Wilson

Abstract

We derive the asymptotic risk function of regularized empirical risk minimization (ERM) estimators tuned by $n$-fold cross-validation (CV). The out-of-sample prediction loss of such estimators converges in distribution to the squared-error loss (risk function) of shrinkage estimators in the normal means model, tuned by Stein's unbiased risk estimate (SURE). This risk function provides a more fine-grained picture of predictive performance than uniform bounds on worst-case regret, which are common in learning theory: it quantifies how risk varies with the true parameter. As key intermediate steps, we show that (i) $n$-fold CV converges uniformly to SURE, and (ii) while SURE typically has multiple local minima, its global minimum is generically well separated. Well-separation ensures that uniform convergence of CV to SURE translates into convergence of the tuning parameter chosen by CV to that chosen by SURE.

From Cross-Validation to SURE: Asymptotic Risk of Tuned Regularized Estimators

Abstract

We derive the asymptotic risk function of regularized empirical risk minimization (ERM) estimators tuned by

-fold cross-validation (CV). The out-of-sample prediction loss of such estimators converges in distribution to the squared-error loss (risk function) of shrinkage estimators in the normal means model, tuned by Stein's unbiased risk estimate (SURE). This risk function provides a more fine-grained picture of predictive performance than uniform bounds on worst-case regret, which are common in learning theory: it quantifies how risk varies with the true parameter. As key intermediate steps, we show that (i)

-fold CV converges uniformly to SURE, and (ii) while SURE typically has multiple local minima, its global minimum is generically well separated. Well-separation ensures that uniform convergence of CV to SURE translates into convergence of the tuning parameter chosen by CV to that chosen by SURE.

Paper Structure (36 sections, 12 theorems, 153 equations, 2 figures)

This paper contains 36 sections, 12 theorems, 153 equations, 2 figures.

Introduction
Background
Risk functions
Main result
Key steps
Literature
Roadmap
Setup
Definitions
Scaling
Estimators of $\theta$ and their asymptotic counterparts
Estimators of risk
Tuned estimators
Assumptions
Main result and intermediate lemmas
...and 21 more sections

Key Result

Theorem 1

Figures (2)

Figure 1: Risk function for JS-shrinkage, dimension 10
Figure 2: Examples of multi-modality of $SURE$

Theorems & Definitions (23)

Definition 1: Loss, empirical loss, and expected loss
Definition 2: Estimators of $\theta_0$
Definition 3: Estimators of risk
Definition 4: Tuned estimators of $\theta$
Definition 5: Risk functions
Theorem 1
Corollary 1
Lemma 1: Lipschitz $g^\lambda$
Lemma 2: Influence function approximation
Lemma 3: Limiting squared error loss
...and 13 more

From Cross-Validation to SURE: Asymptotic Risk of Tuned Regularized Estimators

Abstract

From Cross-Validation to SURE: Asymptotic Risk of Tuned Regularized Estimators

Authors

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (23)