Table of Contents
Fetching ...

The asymptotic effect of tuning parameters

Ingrid Dæhlen, Nils Lid Hjort, Ingrid Hobæk Haff

Abstract

Tuning parameters are parameters involved in an estimating procedure for the purpose of reducing the risk of some other estimator. Examples include the degree of penalization in penalized regression and likelihood problems, as well as the balance parameter in hybrid methods. Typically tuning parameters are set to the minimizers of some estimator of the risk, a step which introduces additional randomness and makes standard methodology inapplicable. We derive precise asymptotic theory for this situation. Our framework allows for smooth, but otherwise arbitrary, loss functions and for the risk to be estimated by cross-validation procedures. Results include consistency of the optimal estimator towards a well-defined quantity and asymptotic normality after proper scaling and centring. We give explicit forms and estimators for the limiting variance matrix and results sharply characterizing the distance from the training error to the cross-validated estimator of the risk.

The asymptotic effect of tuning parameters

Abstract

Tuning parameters are parameters involved in an estimating procedure for the purpose of reducing the risk of some other estimator. Examples include the degree of penalization in penalized regression and likelihood problems, as well as the balance parameter in hybrid methods. Typically tuning parameters are set to the minimizers of some estimator of the risk, a step which introduces additional randomness and makes standard methodology inapplicable. We derive precise asymptotic theory for this situation. Our framework allows for smooth, but otherwise arbitrary, loss functions and for the risk to be estimated by cross-validation procedures. Results include consistency of the optimal estimator towards a well-defined quantity and asymptotic normality after proper scaling and centring. We give explicit forms and estimators for the limiting variance matrix and results sharply characterizing the distance from the training error to the cross-validated estimator of the risk.

Paper Structure

This paper contains 15 sections, 9 theorems, 11 equations, 2 figures.

Key Result

theorem 1

For each fixed $\lambda$, let $\hat{\theta}(\lambda)$ be the unique solution to $n^{-1}\sum_{i=1}^n\varphi(Z_i,\theta,\lambda)=0$. Furthermore, assume that $\hat{\lambda}$ is the unique solution to TE'$(\lambda)=o_{p}(1/\sqrt{n})$. Let $\alpha = (\theta^T,\lambda^T,D^T)^T\in\mathbb{R}^{p+q+pq}$ and provided the function $\Psi$ is continuously differentiable in a neighbourhood of $\alpha_0$, that

Figures (2)

  • Figure 1: Histograms of draws from the bootstrap distribution of $\hat{\beta}(\hat{\lambda})_j$ for $j=0,\ldots,8$ together with the density in the approximate distribution.
  • Figure 2: The plot shows the absolute error of the classic and new estimated covariance of $\sqrt{n}\hat{\beta}_C(\hat{\lambda}_C)$ compared to the observed variance based on a thousand simulated data sets with $n=100$ data points. The variables $V_{jk}$ refer to the covariance between the $j$-th and $k$-th component of $\sqrt{n}\hat{\beta}_C(\hat{\lambda}_C)$ for $j=0,1,2$.

Theorems & Definitions (19)

  • theorem 1
  • remark
  • proof
  • corollary 1
  • proof
  • theorem 2
  • remark
  • proof
  • theorem 3
  • proof
  • ...and 9 more