Table of Contents
Fetching ...

Post-selection inference in generalized linear models via parametric programming

Qinyan Shen, Karl Gregory, Xianzheng Huang

Abstract

We propose a unified framework to draw inferences for regression coefficients in a generalized linear model (GLM) following Lasso-based variable selection. We adapt to non-Gaussian GLMs a recently developed parametric programming strategy for post-selection inference in the linear model with a Gaussian response by drawing parallels between maximum likelihood estimation in GLMs and least squares estimation in linear models. We then conduct post-selection inference based on a linearized model for pseudo response and covariate data strategically created based on the raw data. Using synthetic data generated from regression models for three different types of non-Gaussian responses in simulation experiments, we demonstrate that the proposed method effectively corrects the naive inference that ignores variable selection while achieving greater efficiency than a polyhedral-based post-selection adjustment.

Post-selection inference in generalized linear models via parametric programming

Abstract

We propose a unified framework to draw inferences for regression coefficients in a generalized linear model (GLM) following Lasso-based variable selection. We adapt to non-Gaussian GLMs a recently developed parametric programming strategy for post-selection inference in the linear model with a Gaussian response by drawing parallels between maximum likelihood estimation in GLMs and least squares estimation in linear models. We then conduct post-selection inference based on a linearized model for pseudo response and covariate data strategically created based on the raw data. Using synthetic data generated from regression models for three different types of non-Gaussian responses in simulation experiments, we demonstrate that the proposed method effectively corrects the naive inference that ignores variable selection while achieving greater efficiency than a polyhedral-based post-selection adjustment.

Paper Structure

This paper contains 20 sections, 2 theorems, 28 equations, 1 figure, 4 tables.

Key Result

Theorem 1

Under Assumption assum:main we have (i) $g_{M,j,n}(\hbox{\boldmath $\beta$}) \stackrel{d}{\rightarrow} \mathcal{N} (0,a(\phi))$ and (ii) $|\hat{g}_{M,j,n}(\hbox{\boldmath $\beta$}) - g_{M,j,n}(\hbox{\boldmath $\beta$})| \stackrel{p}{\rightarrow} 0$ as $n \to \infty$.

Figures (1)

  • Figure 1: Average Type I error across $1000$ Monte Carlo replicates over the grid of $\lambda$ values for logistic, Poisson, and beta regression achieved by the PPL method (solid lines) and by the naive method (dashed lines). Heights of bars indicate average sizes of the selected models across the $\lambda$ values.

Theorems & Definitions (2)

  • Theorem 1
  • Corollary 1