Table of Contents
Fetching ...

Bayesian structured additive quantile regression for inflated bounded data

Francisco F. Queiroz, Johannes Brachem, Paul F. V. Wiemann, Thomas Kneib

Abstract

Bounded continuous data on the unit interval frequently arise in applied fields and often exhibit a non-negligible proportion of observations at the boundaries. Inflated regression models address this feature by combining a continuous distribution on the unit interval with a discrete component to account for zero- and/or one-inflation. In this paper, we propose a class of Bayesian structured additive quantile regression models for inflated bounded continuous data that accommodates zero- and/or one-inflation. The proposed approach enables direct modeling of both the conditional quantiles of the continuous component and the probabilities of observing zeros and/or ones, with structured additive predictors incorporated in both parts, including nonlinear effects, spatial effects, random effects, and varying-coefficient terms. Posterior inference is carried out using Markov chain Monte Carlo algorithms implemented through the software Liesel, a probabilistic programming framework for semiparametric regression. The practical performance of the proposed models is illustrated through simulation studies and two real-data applications: one analyzing the proportion of traffic-related fatalities across Brazilian municipal districts, and another evaluating speech intelligibility in cochlear implant recipients under different experimental conditions.

Bayesian structured additive quantile regression for inflated bounded data

Abstract

Bounded continuous data on the unit interval frequently arise in applied fields and often exhibit a non-negligible proportion of observations at the boundaries. Inflated regression models address this feature by combining a continuous distribution on the unit interval with a discrete component to account for zero- and/or one-inflation. In this paper, we propose a class of Bayesian structured additive quantile regression models for inflated bounded continuous data that accommodates zero- and/or one-inflation. The proposed approach enables direct modeling of both the conditional quantiles of the continuous component and the probabilities of observing zeros and/or ones, with structured additive predictors incorporated in both parts, including nonlinear effects, spatial effects, random effects, and varying-coefficient terms. Posterior inference is carried out using Markov chain Monte Carlo algorithms implemented through the software Liesel, a probabilistic programming framework for semiparametric regression. The practical performance of the proposed models is illustrated through simulation studies and two real-data applications: one analyzing the proportion of traffic-related fatalities across Brazilian municipal districts, and another evaluating speech intelligibility in cochlear implant recipients under different experimental conditions.
Paper Structure (9 sections, 1 theorem, 23 equations, 10 figures, 1 table)

This paper contains 9 sections, 1 theorem, 23 equations, 10 figures, 1 table.

Key Result

Proposition 2.1

Let $z \sim \text{N}(0,1)$ and $w \sim \text{Exp}(\delta^2)$ be two independent random variables following a standard normal and exponential distribution with rate parameter $\delta^2$, respectively. Then, where $\xi = (1-2\tau)/[\tau(1-\tau)]$ and $\sigma^2 = 2/[\tau(1-\tau)]$.

Figures (10)

  • Figure 1: Boxplot of the RMSE across all posterior samples for each covariate effect of the discrete part for Scenario S1 (top row) and Scenario S2 (bottom row).
  • Figure 2: Boxplot of the RMSE across all posterior samples for the predicted quantiles considering scenarios S1 (left) and S2 (right).
  • Figure 3: Coverage rate of the $95\%$ confidence interval for the $\tau$th quantile, $p_{0i}$, and $p_{1i}$ for scenarios S1 (top row) and S2 (bottom row). The dashed gray line indicates the nominal $95\%$ coverage level.
  • Figure 4: Spatial distribution of the sample quantiles of $y$ across Brazilian states (UF): 10th percentile ($\tau = 0.1$, left panel), median ($\tau = 0.5$, center panel), and 90th percentile ($\tau = 0.9$, right panel).
  • Figure 5: Plots of the estimated effects of the continuous part. The top row shows the estimated nonlinear effects along with the $90\%$ credible interval. For visualization purposes, the credible interval are only plotted for the $10\%$, $50\%$, and $90\%$ quantiles. The middle and bottom rows present the estimated spatial effects for different quantiles. States whose $90\%$ credible intervals for the spatial effects include zero (nonsignificant) are shown in gray.
  • ...and 5 more figures

Theorems & Definitions (2)

  • Proposition 2.1: Location-scale mixture representation of the ALD
  • Remark 2.1