Table of Contents
Fetching ...

Testing-driven Variable Selection in Bayesian Modal Regression

Jiasong Duan, Hongmei Zhang, Xianzheng Huang

TL;DR

This work develops a Bayesian variable selection framework for modal regression with heavy-tailed, non-Gaussian errors by modeling the error as MixHat$(\nu,\gamma)$ in the model $y=\beta_0+X\beta+\varepsilon$, emphasizing selection of informative covariates in sparse, high-dimensional settings. It integrates a spike-and-slab prior, an EM algorithm for efficient estimation, and a novel testing-driven variable selection strategy based on the change-in-slope statistic (CiS) of the estimated error density, with a permutation-based significance test. Simulations demonstrate robust performance under non-Gaussian and correlated covariates, outperforming Gaussian-focused methods (LASSO, SSL) in true/false positive balance and estimation accuracy, especially when $p< n$, and remain competitive when $p> n$ with cross-validated tuning. Applications to AML gene expression and lung methylation data show TDVS can uncover informative covariates missed by Gaussian approaches and yield interpretable, data-driven error distributions, highlighting its practical impact for robust covariate selection in heavy-tailed contexts.

Abstract

We propose a Bayesian variable selection method in the framework of modal regression for heavy-tailed responses. An efficient expectation-maximization algorithm is employed to expedite parameter estimation. A test statistic is constructed to exploit the shape of the model error distribution to effectively separate informative covariates from unimportant ones. Through simulations, we demonstrate and evaluate the efficacy of the proposed method in identifying important covariates in the presence of non-Gaussian model errors. Finally, we apply the proposed method to analyze two datasets arising in genetic and epigenetic studies.

Testing-driven Variable Selection in Bayesian Modal Regression

TL;DR

This work develops a Bayesian variable selection framework for modal regression with heavy-tailed, non-Gaussian errors by modeling the error as MixHat in the model , emphasizing selection of informative covariates in sparse, high-dimensional settings. It integrates a spike-and-slab prior, an EM algorithm for efficient estimation, and a novel testing-driven variable selection strategy based on the change-in-slope statistic (CiS) of the estimated error density, with a permutation-based significance test. Simulations demonstrate robust performance under non-Gaussian and correlated covariates, outperforming Gaussian-focused methods (LASSO, SSL) in true/false positive balance and estimation accuracy, especially when , and remain competitive when with cross-validated tuning. Applications to AML gene expression and lung methylation data show TDVS can uncover informative covariates missed by Gaussian approaches and yield interpretable, data-driven error distributions, highlighting its practical impact for robust covariate selection in heavy-tailed contexts.

Abstract

We propose a Bayesian variable selection method in the framework of modal regression for heavy-tailed responses. An efficient expectation-maximization algorithm is employed to expedite parameter estimation. A test statistic is constructed to exploit the shape of the model error distribution to effectively separate informative covariates from unimportant ones. Through simulations, we demonstrate and evaluate the efficacy of the proposed method in identifying important covariates in the presence of non-Gaussian model errors. Finally, we apply the proposed method to analyze two datasets arising in genetic and epigenetic studies.

Paper Structure

This paper contains 17 sections, 8 equations, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Probability density functions of $\varepsilon_i$ with parameters $\nu$ and $\gamma$ set at different values
  • Figure 2: Histograms of response data analyzed in Section \ref{['sec:Real data examples']}, superimposed with kernel density estimates