Table of Contents
Fetching ...

Selective inference using randomized group lasso estimators for general models

Yiling Huang, Sarah Pirenne, Snigdha Panigrahi, Gerda Claeskens

Abstract

Selective inference methods are developed for group lasso estimators for use with a wide class of distributions and loss functions. The method includes the use of exponential family distributions, as well as quasi-likelihood modeling for overdispersed count data, for example, and allows for categorical or grouped covariates as well as continuous covariates. A randomized group-regularized optimization problem is studied. The added randomization allows us to construct a post-selection likelihood which we show to be adequate for selective inference when conditioning on the event of the selection of the grouped covariates. This likelihood also provides a selective point estimator, accounting for the selection by the group lasso. Confidence regions for the regression parameters in the selected model take the form of Wald-type regions and are shown to have bounded volume. The selective inference method for grouped lasso is illustrated on data from the national health and nutrition examination survey while simulations showcase its behaviour and favorable comparison with other methods.

Selective inference using randomized group lasso estimators for general models

Abstract

Selective inference methods are developed for group lasso estimators for use with a wide class of distributions and loss functions. The method includes the use of exponential family distributions, as well as quasi-likelihood modeling for overdispersed count data, for example, and allows for categorical or grouped covariates as well as continuous covariates. A randomized group-regularized optimization problem is studied. The added randomization allows us to construct a post-selection likelihood which we show to be adequate for selective inference when conditioning on the event of the selection of the grouped covariates. This likelihood also provides a selective point estimator, accounting for the selection by the group lasso. Confidence regions for the regression parameters in the selected model take the form of Wald-type regions and are shown to have bounded volume. The selective inference method for grouped lasso is illustrated on data from the national health and nutrition examination survey while simulations showcase its behaviour and favorable comparison with other methods.
Paper Structure (30 sections, 15 theorems, 125 equations, 4 figures, 3 tables)

This paper contains 30 sections, 15 theorems, 125 equations, 4 figures, 3 tables.

Key Result

Lemma 3.1

For the randomized group lasso estimation as in eq:opt.problem, it holds that

Figures (4)

  • Figure 1: Boxplots of coverage rate of individual $90\%$ confidence intervals for Gaussian, logistic, Poisson and negative binomial data.
  • Figure 2: Boxplots of F1-scores for measuring the accuracy of model selection per simulation for Gaussian, logistic, Poisson and negative binomial data.
  • Figure 3: Boxplots of average length of individual $90\%$ confidence intervals per simulation for Gaussian, logistic, Poisson and negative binomial data.
  • Figure 4: $90\%$ confidence intervals post-selection by the group lasso for estimating the incidence of depression: from our post-selection likelihood, data splitting and naive inference.

Theorems & Definitions (29)

  • Lemma 3.1
  • Proposition 3.1
  • Corollary 3.1
  • Proposition 3.2
  • Proposition 3.3
  • Proposition 3.4
  • Theorem 3.2
  • Theorem 4.1
  • Lemma 4.2
  • Lemma A.1: Jacobian of Change-of-Variables
  • ...and 19 more