Table of Contents
Fetching ...

A Bayesian decision-theoretic approach to sparse estimation

Aihua Li, Surya T. Tokdar, Jason Xu

TL;DR

This paper tackles sparse estimation in high-dimensional linear regression by marrying Bayesian shrinkage with penalized least squares in a decision-theoretic framework. It introduces Bayesian Decoupling (bd) and adaptive thresholding via the adaptive probability model (apm), together with reweighted $l_1$ penalties (fd and is) to achieve simultaneous bias reduction and convexity. A posterior benchmarking criterion selects the tuning parameter adaptively, producing improved solution paths that favor true signals early while limiting false discoveries, particularly under predictor correlation. Empirical results on simulations and an eQTL application demonstrate sparser models with strong predictive performance, highlighting the method's practical value for genomics and other high-dimensional domains.

Abstract

We extend the work of Hahn and Carvalho (2015) and develop a doubly-regularized sparse regression estimator by synthesizing Bayesian regularization with penalized least squares within a decision-theoretic framework. In contrast to existing Bayesian decision-theoretic formulation chiefly reliant upon the symmetric 0-1 loss, the new method -- which we call Bayesian Decoupling -- employs a family of penalized loss functions indexed by a sparsity-tuning parameter. We propose a class of reweighted l1 penalties, with two specific instances that achieve simultaneous bias reduction and convexity. The design of the penalties incorporates considerations of signal sizes, as enabled by the Bayesian paradigm. The tuning parameter is selected using a posterior benchmarking criterion, which quantifies the drop in predictive power relative to the posterior mean which is the optimal Bayes estimator under the squared error loss. Additionally, in contrast to the widely used median probability model technique which selects variables by thresholding posterior inclusion probabilities at the fixed threshold of 1/2, Bayesian Decoupling enables the use of a data-driven threshold which automatically adapts to estimated signal sizes and offers far better performance in high-dimensional settings with highly correlated predictors. Our numerical results in such settings show that certain combinations of priors and loss functions significantly improve the solution path compared to existing methods, prioritizing true signals early along the path before false signals are selected. Consequently, Bayesian Decoupling produces estimates with better prediction and selection performance. Finally, a real data application illustrates the practical advantages of our approaches which select sparser models with larger coefficient estimates.

A Bayesian decision-theoretic approach to sparse estimation

TL;DR

This paper tackles sparse estimation in high-dimensional linear regression by marrying Bayesian shrinkage with penalized least squares in a decision-theoretic framework. It introduces Bayesian Decoupling (bd) and adaptive thresholding via the adaptive probability model (apm), together with reweighted penalties (fd and is) to achieve simultaneous bias reduction and convexity. A posterior benchmarking criterion selects the tuning parameter adaptively, producing improved solution paths that favor true signals early while limiting false discoveries, particularly under predictor correlation. Empirical results on simulations and an eQTL application demonstrate sparser models with strong predictive performance, highlighting the method's practical value for genomics and other high-dimensional domains.

Abstract

We extend the work of Hahn and Carvalho (2015) and develop a doubly-regularized sparse regression estimator by synthesizing Bayesian regularization with penalized least squares within a decision-theoretic framework. In contrast to existing Bayesian decision-theoretic formulation chiefly reliant upon the symmetric 0-1 loss, the new method -- which we call Bayesian Decoupling -- employs a family of penalized loss functions indexed by a sparsity-tuning parameter. We propose a class of reweighted l1 penalties, with two specific instances that achieve simultaneous bias reduction and convexity. The design of the penalties incorporates considerations of signal sizes, as enabled by the Bayesian paradigm. The tuning parameter is selected using a posterior benchmarking criterion, which quantifies the drop in predictive power relative to the posterior mean which is the optimal Bayes estimator under the squared error loss. Additionally, in contrast to the widely used median probability model technique which selects variables by thresholding posterior inclusion probabilities at the fixed threshold of 1/2, Bayesian Decoupling enables the use of a data-driven threshold which automatically adapts to estimated signal sizes and offers far better performance in high-dimensional settings with highly correlated predictors. Our numerical results in such settings show that certain combinations of priors and loss functions significantly improve the solution path compared to existing methods, prioritizing true signals early along the path before false signals are selected. Consequently, Bayesian Decoupling produces estimates with better prediction and selection performance. Finally, a real data application illustrates the practical advantages of our approaches which select sparser models with larger coefficient estimates.

Paper Structure

This paper contains 11 sections, 19 equations, 10 figures, 2 tables.

Figures (10)

  • Figure 1: Shrinkage function of bd under the spike-and-slab prior with fixed prior parameters $\pi_0=0.5,\sigma^2=1$. The parameters used for illustration are $\lambda=2$ for the $l_1$ penalty, $\lambda=6$ for the fd penalty, $\lambda=0.01$ and $\epsilon=0.001$ for the is penalty. Each plot shows a 45-degree dotted reference line.
  • Figure 2: Posterior expected fd penalty function, under the spike-and-slab prior with parameters $\pi_0=0.5,\sigma^2=1,n=1$.
  • Figure 3: Squared prediction error $\mathcal{E}_\lambda$ (left) and solution path (right) of bd with fd penalty. Displayed is one synthetic data under $n=50,p=30,s^*=20,k=10,\rho=0$. The plot of $\mathcal{E}_\lambda$ shows the posterior means of $\mathcal{E}_\lambda$ in dots and the 90% credible intervals in bars; the horizontal dotted line represents the benchmark $\mathcal{E}$. The plot of $\hat{\beta}_{j,\lambda}$ shows different coefficients by different lines.
  • Figure 4: Posterior inclusion probabilities across all predictors on synthetic data with $n=200$, $k=20$, $s^*=20$, $p=200,400,2000$, and $\rho=0.3,0.95$. Each plot is labeled by the number of predictors with $\textsc{pip}\ge 0.5$.
  • Figure 5: Shrinkage function of apm under the spike-and-slab prior, with parameters $\pi_0=0.5,\sigma^2=1,n=1,\lambda=0.6$.
  • ...and 5 more figures