Table of Contents
Fetching ...

Tail-adaptive Bayesian shrinkage

Se Yoon Lee, Peng Zhao, Debdeep Pati, Bani K. Mallick

Abstract

Robust Bayesian methods for high-dimensional regression problems under diverse sparse regimes are studied. Traditional shrinkage priors are primarily designed to detect a handful of signals from tens of thousands of predictors in the so-called ultra-sparsity domain. However, they may not perform desirably when the degree of sparsity is moderate. In this paper, we propose a robust sparse estimation method under diverse sparsity regimes, which has a tail-adaptive shrinkage property. In this property, the tail-heaviness of the prior adjusts adaptively, becoming larger or smaller as the sparsity level increases or decreases, respectively, to accommodate more or fewer signals, a posteriori. We propose a global-local-tail (GLT) Gaussian mixture distribution that ensures this property. We examine the role of the tail-index of the prior in relation to the underlying sparsity level and demonstrate that the GLT posterior contracts at the minimax optimal rate for sparse normal mean models. We apply both the GLT prior and the Horseshoe prior to a real data problem and simulation examples. Our findings indicate that the varying tail rule based on the GLT prior offers advantages over a fixed tail rule based on the Horseshoe prior in diverse sparsity regimes.

Tail-adaptive Bayesian shrinkage

Abstract

Robust Bayesian methods for high-dimensional regression problems under diverse sparse regimes are studied. Traditional shrinkage priors are primarily designed to detect a handful of signals from tens of thousands of predictors in the so-called ultra-sparsity domain. However, they may not perform desirably when the degree of sparsity is moderate. In this paper, we propose a robust sparse estimation method under diverse sparsity regimes, which has a tail-adaptive shrinkage property. In this property, the tail-heaviness of the prior adjusts adaptively, becoming larger or smaller as the sparsity level increases or decreases, respectively, to accommodate more or fewer signals, a posteriori. We propose a global-local-tail (GLT) Gaussian mixture distribution that ensures this property. We examine the role of the tail-index of the prior in relation to the underlying sparsity level and demonstrate that the GLT posterior contracts at the minimax optimal rate for sparse normal mean models. We apply both the GLT prior and the Horseshoe prior to a real data problem and simulation examples. Our findings indicate that the varying tail rule based on the GLT prior offers advantages over a fixed tail rule based on the Horseshoe prior in diverse sparsity regimes.

Paper Structure

This paper contains 55 sections, 12 theorems, 85 equations, 12 figures, 1 table, 1 algorithm.

Key Result

Proposition 2.2

Assume $\beta | \lambda, \tau \sim \mathcal{N}_{1}(0, \lambda^{2}\tau^{2})$, $\lambda \sim C^{+}(0,1)$, and $\tau>0$. Then the tail-index of $\pi_{\text{HS}}(\beta | \tau)$ is $\alpha = 1$ for any $\tau > 0$.

Figures (12)

  • Figure 1: Comparison between the marginal densities of the Horseshoe and GLT prior ($\pi_{\text{HS}}(\beta|\tau)$ and $\pi(\beta|\tau, \xi)$). The global-scale parameter $\tau$ is set to $\tau=1$ (Panels (a) and (b)) and $\tau = 0.001$ (Panels (c) and (d)). The density $\pi_{\text{HS}}(\beta|\tau)$ is depicted in black, while the densities $\pi(\beta|\tau, \xi)$ are shown in red ($\xi=1$), green ($\xi=1.5$), blue ($\xi=2$), and violet ($\xi=3$), respectively.
  • Figure 2: Comparison between two densities of the random shrinkage coefficient ($\pi_{\text{HS}}(\kappa|\tau)$ and $\pi(\kappa|\tau, \xi)$). The global-scale parameter $\tau$ is set to $\tau=1$ (Panels (a), (b), and (c)) and $\tau = 0.001$ (Panels (d), (e), and (f)). The density $\pi_{\text{HS}}(\kappa|\tau)$ is depicted in black, while the densities $\pi(\kappa|\tau, \xi)$ are shown in red ($\xi=1$), green ($\xi=1.5$), blue ($\xi=2$), and violet ($\xi=3$), respectively.
  • Figure 3: Comparisons between the optimal value $\xi$ (up to a multiplicative constant) (Panel (a)) and the posterior mean $\widehat{\xi} = \mathbb{E}[\xi|\textbf{y}]$ (Panel (b)) under sparse normal mean models. $x$-axis of the panels represents the sparsity level $s=q/n$ ranging from around 0.001 to 0.2.
  • Figure 4: Histogram of $z$-values $\{y_{j} \}_{j=1}^{6033}$ obtained from prostate cancer data (Panel (a)) and the Q-Q plot (Panel (b))
  • Figure 5: An idealistic reversed-$S$-shape curve (Panel (a)) is formed by pairs $\{(y_{j},\widehat{\beta}_{j})\}_{j=1}^{p}$ when the Horseshoe estimator achieves the robustness property. The posterior inference results (Panel (b)) are obtained using the Horseshoe prior on the seven datasets $\mathcal{P}_{l}$ ($l=1,\cdots,7$). The dotted line represents $y=x$. The posterior means of $\tau$ for the four datasets are $0.158$ ($\mathcal{P}_{1}$) and $0.018$ ($\mathcal{P}_{2}$), and numerically zero for $\mathcal{P}_{l}$ ($l=3,\cdots,7$).
  • ...and 7 more figures

Theorems & Definitions (16)

  • Definition 2.1
  • Proposition 2.2
  • Proposition 4.1
  • Corollary 4.2
  • Proposition 4.3
  • Lemma 4.4
  • Theorem 4.5: MSE for the GLT estimator
  • Theorem 4.6: Spread of the GLT posterior
  • Theorem 4.7: Posterior contraction
  • Lemma A.2.1: Marginal density and random shrinkage coefficient of the Horseshoe
  • ...and 6 more