Table of Contents
Fetching ...

Advances in Bayesian model selection consistency for high-dimensional generalized linear models

Jeyong Lee, Minwoo Chae, Ryan Martin

TL;DR

The paper develops a rigorous Bayesian framework for variable selection in high-dimensional GLMs by combining a data-driven empirical prior with fractional posteriors. Leveraging Spokoiny’s non-asymptotic quadratic expansions, it achieves sharp Laplace approximations of marginal likelihoods, enabling near-optimal model selection consistency under weaker beta-min conditions and even for Poisson models with sub-exponential tails. It proves posterior contraction in Hellinger distance and shows that, asymptotically, mass concentrates on a small effective model class while supersets and false negatives are suppressed under mild design-regularity assumptions. The work also outlines scalable computation via MH-MCMC with Laplace-based marginal likelihoods and discusses practical hyperparameter regimes, contributing both theoretical guarantees and practical guidance for Bayesian high-dimensional GLMs.

Abstract

Uncovering genuine relationships between a response variable of interest and a large collection of covariates is a fundamental and practically important problem. In the context of Gaussian linear models, both the Bayesian and non-Bayesian literature is well-developed and there are no substantial differences in the model selection consistency results available from the two schools. For the more challenging generalized linear models (GLMs), however, Bayesian model selection consistency results are lacking in several ways. In this paper, we construct a Bayesian posterior distribution using an appropriate data-dependent prior and develop its asymptotic concentration properties using new theoretical techniques. In particular, we leverage Spokoiny's powerful non-asymptotic theory to obtain sharp quadratic approximations of the GLM's log-likelihood function, which leads to tight bounds on the errors associated with the model-specific maximum likelihood estimators and the Laplace approximation of our Bayesian marginal likelihood. In turn, these improved bounds lead to significantly stronger, near-optimal Bayesian model selection consistency results, e.g., far weaker beta-min conditions, compared to those available in the existing literature. In particular, our results are applicable to the Poisson regression model, in which the score function is not sub-Gaussian.

Advances in Bayesian model selection consistency for high-dimensional generalized linear models

TL;DR

The paper develops a rigorous Bayesian framework for variable selection in high-dimensional GLMs by combining a data-driven empirical prior with fractional posteriors. Leveraging Spokoiny’s non-asymptotic quadratic expansions, it achieves sharp Laplace approximations of marginal likelihoods, enabling near-optimal model selection consistency under weaker beta-min conditions and even for Poisson models with sub-exponential tails. It proves posterior contraction in Hellinger distance and shows that, asymptotically, mass concentrates on a small effective model class while supersets and false negatives are suppressed under mild design-regularity assumptions. The work also outlines scalable computation via MH-MCMC with Laplace-based marginal likelihoods and discusses practical hyperparameter regimes, contributing both theoretical guarantees and practical guidance for Bayesian high-dimensional GLMs.

Abstract

Uncovering genuine relationships between a response variable of interest and a large collection of covariates is a fundamental and practically important problem. In the context of Gaussian linear models, both the Bayesian and non-Bayesian literature is well-developed and there are no substantial differences in the model selection consistency results available from the two schools. For the more challenging generalized linear models (GLMs), however, Bayesian model selection consistency results are lacking in several ways. In this paper, we construct a Bayesian posterior distribution using an appropriate data-dependent prior and develop its asymptotic concentration properties using new theoretical techniques. In particular, we leverage Spokoiny's powerful non-asymptotic theory to obtain sharp quadratic approximations of the GLM's log-likelihood function, which leads to tight bounds on the errors associated with the model-specific maximum likelihood estimators and the Laplace approximation of our Bayesian marginal likelihood. In turn, these improved bounds lead to significantly stronger, near-optimal Bayesian model selection consistency results, e.g., far weaker beta-min conditions, compared to those available in the existing literature. In particular, our results are applicable to the Poisson regression model, in which the score function is not sub-Gaussian.
Paper Structure (35 sections, 64 theorems, 641 equations, 2 tables)

This paper contains 35 sections, 64 theorems, 641 equations, 2 tables.

Key Result

Lemma 4.1

Suppose that (A1) holds. Then, with ${\mathbb P}_{0}^{(n)}$-probability at least $1 - 2p^{-1}$, the following inequalities hold uniformly for all non-empty $S \in {\mathscr S}_{s_{\max}}$: where $C > 0$ is a constant depending only on $C_{\rm dev}$, which is specified in def:C_dev_GLM_main.

Theorems & Definitions (128)

  • Lemma 4.1
  • proof
  • Theorem 4.2: Effective dimension
  • proof
  • Theorem 4.3: Consistency in Hellinger distance
  • proof
  • Theorem 4.4: Consistency in parameter $\theta$
  • proof
  • Lemma 4.5: Misspecification on ${\mathscr S}_{\Theta_n}$
  • proof
  • ...and 118 more