Table of Contents
Fetching ...

Posterior Mode Guided Dimension Reduction for Bayesian Model Averaging in Heavy-Tailed Linear Regression

Shamriddha De, Joyee Ghosh

TL;DR

A hybrid method that blends MAP estimation with MCMC-based stochastic search algorithms within a heavy-tailed error framework is proposed, shown to exhibit several advantages in variable selection and uncertainty quantification over various state-of-the-art methods.

Abstract

For large model spaces, the potential entrapment of Markov chain Monte Carlo (MCMC) based methods with spike-and-slab priors poses significant challenges in posterior computation in regression models. On the other hand, maximum a posteriori (MAP) estimation, which is a more computationally viable alternative, fails to provide uncertainty quantification. To address these problems simultaneously and efficiently, this paper proposes a hybrid method that blends MAP estimation with MCMC-based stochastic search algorithms within a heavy-tailed error framework. Under hyperbolic errors, the current work develops a two-step expectation conditional maximization (ECM) guided MCMC algorithm. In the first step, we conduct an ECM-based posterior maximization and perform variable selection, thereby identifying a reduced model space in a high posterior probability region. In the second step, we execute a Gibbs sampler on the reduced model space for posterior computation. Such a method is expected to improve the efficiency of posterior computation and enhance its inferential richness. Through simulation studies and benchmark real life examples, our proposed method is shown to exhibit several advantages in variable selection and uncertainty quantification over various state-of-the-art methods.

Posterior Mode Guided Dimension Reduction for Bayesian Model Averaging in Heavy-Tailed Linear Regression

TL;DR

A hybrid method that blends MAP estimation with MCMC-based stochastic search algorithms within a heavy-tailed error framework is proposed, shown to exhibit several advantages in variable selection and uncertainty quantification over various state-of-the-art methods.

Abstract

For large model spaces, the potential entrapment of Markov chain Monte Carlo (MCMC) based methods with spike-and-slab priors poses significant challenges in posterior computation in regression models. On the other hand, maximum a posteriori (MAP) estimation, which is a more computationally viable alternative, fails to provide uncertainty quantification. To address these problems simultaneously and efficiently, this paper proposes a hybrid method that blends MAP estimation with MCMC-based stochastic search algorithms within a heavy-tailed error framework. Under hyperbolic errors, the current work develops a two-step expectation conditional maximization (ECM) guided MCMC algorithm. In the first step, we conduct an ECM-based posterior maximization and perform variable selection, thereby identifying a reduced model space in a high posterior probability region. In the second step, we execute a Gibbs sampler on the reduced model space for posterior computation. Such a method is expected to improve the efficiency of posterior computation and enhance its inferential richness. Through simulation studies and benchmark real life examples, our proposed method is shown to exhibit several advantages in variable selection and uncertainty quantification over various state-of-the-art methods.
Paper Structure (17 sections, 1 theorem, 15 equations, 6 figures, 7 tables)

This paper contains 17 sections, 1 theorem, 15 equations, 6 figures, 7 tables.

Key Result

Proposition 1

Let $A$ and $a$ be random variables such that $A|a^2 \sim \mathrm{N}(0, \rho^2a^2)$ and $a^2 \sim \mathrm{GIG}(1, \eta, \eta)$. Then $A$ is distributed as $\mathrm{Hyperbolic}(\eta, \rho^2)$.

Figures (6)

  • Figure 1: RMSEs of all regression coefficients and the expected values of the response variables for the proposed GECM-HEM method and different versions of G-HEM under true hyperbolic errors in Scenario (I). The boxplots are based on 100 replicates.
  • Figure 2: Predictive performance of the proposed GECM-HEM method and different versions of G-HEM under true hyperbolic errors in Scenario (I). In the middle plot, the dashed line marks a coverage of 90%. The boxplots are based on 100 replicates.
  • Figure 3: RMSEs of all regression coefficients and the expected values of the response variables for the proposed GECM-HEM method and the other studied MAP-based methods under different scenarios. The boxplots are based on 100 replications.
  • Figure 4: Predictive performance of the proposed GECM-HEM method and the other studied MAP-based methods under different scenarios. The boxplots are based on 100 replications.
  • Figure 5: Predictive performance of the proposed GECM-HEM method and the other studied MAP-based methods for different real datasets. In the plots for empirical coverage probability, the dashed lines mark a coverage of 90%. The boxplots for Boston Housing and Faculty Salaries datasets are based on 100 replications. The boxplots for Ames Housing dataset are based on 11 replications for which SSQLASSO can be successfully executed.
  • ...and 1 more figures

Theorems & Definitions (2)

  • Proposition 1
  • proof