Table of Contents
Fetching ...

Stochastic EM Estimation and Inference for Zero-Inflated Beta-Binomial Mixed Models for Longitudinal Count Data

John Barrera, Ana Arribas-Gil, Dae-Jin Lee, Cristian Meza

TL;DR

This work tackles longitudinal count data with excess zeros and overdispersion by proposing the Zero-Inflated Beta-Binomial Mixed Effects Regression (ZIBBMR), which combines a zero-inflation component with a Beta-Binomial count model and subject-specific random effects. Estimation is carried out via the Stochastic Approximation EM (SAEM) algorithm with latent-variable augmentation, enabling likelihood-based inference despite an intractable likelihood. Through extensive simulations, ZIBBMR–SAEM demonstrates accurate parameter recovery and robust inference for variance components, particularly in small-sample settings, and outperforms competing methods in challenging scenarios; in a microbiome application, it provides complementary insights to a ZIBR benchmark while remaining more stable than some standard GLMM approaches. An accompanying R implementation facilitates fitting ZIBBMR and comparing it against ZIBR, highlighting the practical utility of jointly modeling counts and overdispersion in zero-inflated longitudinal data for biomedical research.

Abstract

Analyzing overdispersed, zero-inflated, longitudinal count data poses significant modeling and computational challenges, which standard count models (e.g., Poisson or negative binomial mixed effects models) fail to adequately address. We propose a Zero-Inflated Beta-Binomial Mixed Effects Regression (ZIBBMR) model that augments a beta-binomial count model with a zero-inflation component, fixed effects for covariates, and subject-specific random effects, accommodating excessive zeros, overdispersion, and within-subject correlation. Maximum likelihood estimation is performed via a Stochastic Approximation EM (SAEM) algorithm with latent variable augmentation, which circumvents the model's intractable likelihood and enables efficient computation. Simulation studies show that ZIBBMR achieves accuracy comparable to leading mixed-model approaches in the literature and surpasses simpler zero-inflated count formulations, particularly in small-sample scenarios. As a case study, we analyze longitudinal microbiome data, comparing ZIBBMR with an external Zero-Inflated Beta Regression (ZIBR) benchmark; the results indicate that applying both count- and proportion-based models in parallel can enhance inference robustness when both data types are available.

Stochastic EM Estimation and Inference for Zero-Inflated Beta-Binomial Mixed Models for Longitudinal Count Data

TL;DR

This work tackles longitudinal count data with excess zeros and overdispersion by proposing the Zero-Inflated Beta-Binomial Mixed Effects Regression (ZIBBMR), which combines a zero-inflation component with a Beta-Binomial count model and subject-specific random effects. Estimation is carried out via the Stochastic Approximation EM (SAEM) algorithm with latent-variable augmentation, enabling likelihood-based inference despite an intractable likelihood. Through extensive simulations, ZIBBMR–SAEM demonstrates accurate parameter recovery and robust inference for variance components, particularly in small-sample settings, and outperforms competing methods in challenging scenarios; in a microbiome application, it provides complementary insights to a ZIBR benchmark while remaining more stable than some standard GLMM approaches. An accompanying R implementation facilitates fitting ZIBBMR and comparing it against ZIBR, highlighting the practical utility of jointly modeling counts and overdispersion in zero-inflated longitudinal data for biomedical research.

Abstract

Analyzing overdispersed, zero-inflated, longitudinal count data poses significant modeling and computational challenges, which standard count models (e.g., Poisson or negative binomial mixed effects models) fail to adequately address. We propose a Zero-Inflated Beta-Binomial Mixed Effects Regression (ZIBBMR) model that augments a beta-binomial count model with a zero-inflation component, fixed effects for covariates, and subject-specific random effects, accommodating excessive zeros, overdispersion, and within-subject correlation. Maximum likelihood estimation is performed via a Stochastic Approximation EM (SAEM) algorithm with latent variable augmentation, which circumvents the model's intractable likelihood and enables efficient computation. Simulation studies show that ZIBBMR achieves accuracy comparable to leading mixed-model approaches in the literature and surpasses simpler zero-inflated count formulations, particularly in small-sample scenarios. As a case study, we analyze longitudinal microbiome data, comparing ZIBBMR with an external Zero-Inflated Beta Regression (ZIBR) benchmark; the results indicate that applying both count- and proportion-based models in parallel can enhance inference robustness when both data types are available.
Paper Structure (26 sections, 22 equations, 4 figures, 3 tables)

This paper contains 26 sections, 22 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Empirical densities of parameter estimates in Setting 1 for the proposed SAEM routine, glmmTMB, and gamlss. Parameters are grouped into: zero-inflation component ($a$, $\alpha$), mean component ($b$, $\beta$), and variance and dispersion component parameters ($\sigma_1^2$, $\sigma_2^2$, $\phi$). The vertical dotted lines indicate the true parameter values.
  • Figure 2: Empirical densities of parameter estimates in Setting 2 for the proposed SAEM routine, glmmTMB, and gamlss. Grouping and interpretation of parameters follow Figure \ref{['fig.31']}.
  • Figure 3: Empirical densities of parameter estimates in Setting 3 for the proposed SAEM routine, glmmTMB and gamlss. Grouping and interpretation of parameters follow Figure \ref{['fig.31']}.
  • Figure 4: Alpha-diversity indices for pregnant and non-pregnant women.