Table of Contents
Fetching ...

Analysis of Bootstrap and Subsampling in High-dimensional Regularized Regression

Lucas Clarté, Adrien Vandenbroucque, Guillaume Dalle, Bruno Loureiro, Florent Krzakala, Lenka Zdeborová

Abstract

We investigate popular resampling methods for estimating the uncertainty of statistical models, such as subsampling, bootstrap and the jackknife, and their performance in high-dimensional supervised regression tasks. We provide a tight asymptotic description of the biases and variances estimated by these methods in the context of generalized linear models, such as ridge and logistic regression, taking the limit where the number of samples $n$ and dimension $d$ of the covariates grow at a comparable fixed rate $α\!=\! n/d$. Our findings are three-fold: i) resampling methods are fraught with problems in high dimensions and exhibit the double-descent-like behavior typical of these situations; ii) only when $α$ is large enough do they provide consistent and reliable error estimations (we give convergence rates); iii) in the over-parametrized regime $α\!<\!1$ relevant to modern machine learning practice, their predictions are not consistent, even with optimal regularization.

Analysis of Bootstrap and Subsampling in High-dimensional Regularized Regression

Abstract

We investigate popular resampling methods for estimating the uncertainty of statistical models, such as subsampling, bootstrap and the jackknife, and their performance in high-dimensional supervised regression tasks. We provide a tight asymptotic description of the biases and variances estimated by these methods in the context of generalized linear models, such as ridge and logistic regression, taking the limit where the number of samples and dimension of the covariates grow at a comparable fixed rate . Our findings are three-fold: i) resampling methods are fraught with problems in high dimensions and exhibit the double-descent-like behavior typical of these situations; ii) only when is large enough do they provide consistent and reliable error estimations (we give convergence rates); iii) in the over-parametrized regime relevant to modern machine learning practice, their predictions are not consistent, even with optimal regularization.
Paper Structure (60 sections, 4 theorems, 132 equations, 3 figures, 1 table, 1 algorithm)

This paper contains 60 sections, 4 theorems, 132 equations, 3 figures, 1 table, 1 algorithm.

Key Result

Theorem 4.1

Let $\mathcal{D} = \{(\Vec{x}_{i}, y_{i})_{i\in[n]}\}$ denote $n$ independent samples drawn from model eq:def_model with log-concave likelihood $p(y|z)$. In the high-dimensional proportional regime $n, d\to\infty$ with $n/d=\alpha$, the overlaps of interest eq:def:overlaps are given by the unique so for a careful choice of the joint distribution of $\Vec{p} = (p_1, p_2)$. In the above, $\boldsymbo

Figures (3)

  • Figure 1: Variances for ridge regression at $\lambda = 10^{-2}$ (Top) and $\lambda = 1$ (Bottom). Left: variance of pair resampling methods and of Bayes-posterior. Middle: variance of conditional resampling and residual bootstrap. Right: difference between the true variances $\mathop{\mathrm{Var}}\nolimits_{\mathcal{D}}(\hat{\boldsymbol{\theta}}_{\lambda})$, $\mathop{\mathrm{Var}}\nolimits_{\mathcal{D}|\boldsymbol{X}}(\hat{\boldsymbol{\theta}}_{\lambda})$ and their estimation. Dots are simulations done at $d = 200$, with $B = 10$ resamples for bootstrap and subsampling.
  • Figure 2: Bias of ridge regression and its estimation using pair bootstrap and subsampling at $\lambda = 10^{-2}$ (Top) and $\lambda = 1$ (Bottom). Left: bias of pair resampling methods. Middle: conditional bias and bias of residual bootstrap. Right: difference between the various biases.
  • Figure 3: Variance for logistic regression at $\lambda = 10^{-2}$ (Top) and $\lambda = 1$ (Bottom). Left: variance of full resampling, pair bootstrap, subsampling. Right: variance of label resampling, residual bootstrap. See \ref{['fig:variance_ridge']} for the legend.

Theorems & Definitions (7)

  • Theorem 4.1: Biases and Variances for pair resampling in ridge regression
  • Theorem 4.2: Biases and Variances for conditional resampling in ridge regression
  • Proposition C.1
  • Remark C.2
  • Remark C.3
  • Proposition C.4
  • Remark C.5