Table of Contents
Fetching ...

Handling bounded response in high dimensions: a Horseshoe prior Bayesian Beta regression approach

The Tien Mai

TL;DR

This work extends Bayesian Beta regression to high-dimensional sparse settings by combining a tempered (fractional) posterior with a Horseshoe prior, enabling robust variable selection for bounded responses. It introduces a Gibbs sampler based on Polya–Gamma augmentation to achieve efficient posterior inference and proves the first posterior-consistency and convergence-rate results for Bayesian Beta regression. Empirical results from simulations and a GPA dataset application show superior estimation accuracy, predictive performance, and variable-selection capabilities compared to standard Beta regression and transformed Lasso. The approach is implemented in the R package betaregbayes, facilitating broad application to bounded outcomes across disciplines.

Abstract

Bounded continuous responses -- such as proportions -- arise frequently in diverse scientific fields including climatology, biostatistics, and finance. Beta regression is a widely adopted framework for modeling such data, due to the flexibility of the Beta distribution over the unit interval. While Bayesian extensions of Beta regression have shown promise, existing methods are limited to low-dimensional settings and lack theoretical guarantees. In this work, we propose a novel Bayesian approach for high-dimensional sparse Beta regression framework that employs a tempered posterior. Our method incorporates the Horseshoe prior for effective shrinkage and variable selection. Most notable, we propose a novel Gibbs sampling algorithm using Pólya-Gamma augmentation for efficient inference in Beta regression model. We also provide the first theoretical results establishing posterior consistency and convergence rates for Bayesian Beta regression. Through extensive simulation studies in both low- and high-dimensional scenarios, we demonstrate that our approach outperforms existing alternatives, offering improved estimation accuracy and model interpretability. Our method is implemented in the R package ``betaregbayes" available on Github.

Handling bounded response in high dimensions: a Horseshoe prior Bayesian Beta regression approach

TL;DR

This work extends Bayesian Beta regression to high-dimensional sparse settings by combining a tempered (fractional) posterior with a Horseshoe prior, enabling robust variable selection for bounded responses. It introduces a Gibbs sampler based on Polya–Gamma augmentation to achieve efficient posterior inference and proves the first posterior-consistency and convergence-rate results for Bayesian Beta regression. Empirical results from simulations and a GPA dataset application show superior estimation accuracy, predictive performance, and variable-selection capabilities compared to standard Beta regression and transformed Lasso. The approach is implemented in the R package betaregbayes, facilitating broad application to bounded outcomes across disciplines.

Abstract

Bounded continuous responses -- such as proportions -- arise frequently in diverse scientific fields including climatology, biostatistics, and finance. Beta regression is a widely adopted framework for modeling such data, due to the flexibility of the Beta distribution over the unit interval. While Bayesian extensions of Beta regression have shown promise, existing methods are limited to low-dimensional settings and lack theoretical guarantees. In this work, we propose a novel Bayesian approach for high-dimensional sparse Beta regression framework that employs a tempered posterior. Our method incorporates the Horseshoe prior for effective shrinkage and variable selection. Most notable, we propose a novel Gibbs sampling algorithm using Pólya-Gamma augmentation for efficient inference in Beta regression model. We also provide the first theoretical results establishing posterior consistency and convergence rates for Bayesian Beta regression. Through extensive simulation studies in both low- and high-dimensional scenarios, we demonstrate that our approach outperforms existing alternatives, offering improved estimation accuracy and model interpretability. Our method is implemented in the R package ``betaregbayes" available on Github.

Paper Structure

This paper contains 15 sections, 8 theorems, 33 equations, 2 figures, 4 tables.

Key Result

Theorem 1

For any $\alpha\in(0,1)$, assume that Assumption assum_grow_p, assum_beta0_bounded, asmum_random_design and asmum_finite_mean hold. We have that where $\varepsilon_n = K s^* \log \left( p /s^*\right) / n$, for some numerical constant $K>0$ depending only on $C_1, C_2, C_{\rm x}$.

Figures (2)

  • Figure 1: Trace plots from the Gibbs sampler for selected parameter entries. Top row: three randomly chosen entries with true value 1. Middle row: three randomly chosen entries with true value $-1$. Bottom row: three randomly chosen entries with true value 0. Red lines show the true values.
  • Figure 2: ACF plots from the Gibbs sampler for some random entries as in Figure \ref{['fig_tracplot']}. Top row (3 plots): 3 random entries with true value 1. Middle row (3 plots): 3 random entries with true value $-1$. Bottom row (3 plots): 3 random entries with true value 0.

Theorems & Definitions (8)

  • Theorem 1
  • Proposition 4.1
  • Lemma 1
  • Lemma 2
  • Lemma 3
  • Theorem 2: Theorem 2.6 in alquier2020concentration
  • Theorem 3: Corollary 2.5 in alquier2020concentration
  • Lemma 4: Lemma 3 in mai2024concentration