Table of Contents
Fetching ...

Bootstrapping Lasso in Generalized Linear Models

Mayukh Choudhury, Debraj Das

TL;DR

This work addresses inference for Lasso-regularized GLMs, where the asymptotic distribution of $\hat{\bm{\beta}}_n$ is intractable. It develops two bootstrap approaches, Perturbation Bootstrap (PB) and Pearson's Residual Bootstrap (PRB), and shows naive implementations fail due to variable-selection inconsistency; a thresholded Lasso centering strategy yields valid bootstrap pivots. The authors prove consistency results for the modified PB and PRB approximations, support them with a moderately large simulation study across GLM submodels, and demonstrate applicability on a real clinical dataset. The methods extend to both fixed and random designs, enabling reliable, model-selection-aware inference for high-dimensional GLMs and practical confidence regions for submodels such as logistic, gamma, and linear regression.

Abstract

Generalized linear model or GLM constitutes a large class of models and essentially extends the ordinary linear regression by connecting the mean of the response variable with the covariate through appropriate link functions. On the other hand, Lasso is a popular and easy-to-implement penalization method in regression when not all covariates are relevant. However, the asymptotic distributional properties the Lasso estimator in GLM is still unknown. In this paper, we show that the Lasso estimator in GLM does not have a tractable form and subsequently, we develop two Bootstrap methods, namely the Perturbation Bootstrap and Pearson's Residual Bootstrap methods, for approximating the distribution of the Lasso estimator in GLM. As a result, our Bootstrap methods can be used to draw valid statistical inferences for any sub-model of GLM. We support our theoretical findings by showing good finite-sample properties of the proposed Bootstrap methods through a moderately large simulation study. We also implement one of our Bootstrap methods on a real data set.

Bootstrapping Lasso in Generalized Linear Models

TL;DR

This work addresses inference for Lasso-regularized GLMs, where the asymptotic distribution of is intractable. It develops two bootstrap approaches, Perturbation Bootstrap (PB) and Pearson's Residual Bootstrap (PRB), and shows naive implementations fail due to variable-selection inconsistency; a thresholded Lasso centering strategy yields valid bootstrap pivots. The authors prove consistency results for the modified PB and PRB approximations, support them with a moderately large simulation study across GLM submodels, and demonstrate applicability on a real clinical dataset. The methods extend to both fixed and random designs, enabling reliable, model-selection-aware inference for high-dimensional GLMs and practical confidence regions for submodels such as logistic, gamma, and linear regression.

Abstract

Generalized linear model or GLM constitutes a large class of models and essentially extends the ordinary linear regression by connecting the mean of the response variable with the covariate through appropriate link functions. On the other hand, Lasso is a popular and easy-to-implement penalization method in regression when not all covariates are relevant. However, the asymptotic distributional properties the Lasso estimator in GLM is still unknown. In this paper, we show that the Lasso estimator in GLM does not have a tractable form and subsequently, we develop two Bootstrap methods, namely the Perturbation Bootstrap and Pearson's Residual Bootstrap methods, for approximating the distribution of the Lasso estimator in GLM. As a result, our Bootstrap methods can be used to draw valid statistical inferences for any sub-model of GLM. We support our theoretical findings by showing good finite-sample properties of the proposed Bootstrap methods through a moderately large simulation study. We also implement one of our Bootstrap methods on a real data set.
Paper Structure (22 sections, 18 theorems, 123 equations, 8 figures, 18 tables)

This paper contains 22 sections, 18 theorems, 123 equations, 8 figures, 18 tables.

Key Result

Theorem 3.1

Under assumptions (C.1)-(C.6), we have

Figures (8)

  • Figure 1: Coverage error of two-sided $90\%$ confidence intervals as a function of sample size $n$ in Gamma and Linear regression with $K$-fold CV.
  • Figure 2: Coverage Error of both-sided $90\%$ confidence intervals over $n$ in logistic regression.
  • Figure 3: Coverage error of two-sided $90\%$ confidence intervals as a function of sample size $n$ in Gamma and Linear regression with fixed CV choice.
  • Figure 4: Coverage Error of Both sided $90\%$ Confidence Interval for $a_n=n^{-0.0015}$.
  • Figure 5: Coverage Error of Both sided $90\%$ Confidence Interval for $a_n=n^{-1/6}$.
  • ...and 3 more figures

Theorems & Definitions (22)

  • Remark 2.1
  • Theorem 3.1
  • Remark 4.1
  • Theorem 5.1
  • Theorem 5.2
  • Remark 5.1
  • Theorem 6.1
  • Remark 6.1
  • Lemma A.1
  • Lemma A.2
  • ...and 12 more