Stochastic EM Estimation and Inference for Zero-Inflated Beta-Binomial Mixed Models for Longitudinal Count Data
John Barrera, Ana Arribas-Gil, Dae-Jin Lee, Cristian Meza
TL;DR
This work tackles longitudinal count data with excess zeros and overdispersion by proposing the Zero-Inflated Beta-Binomial Mixed Effects Regression (ZIBBMR), which combines a zero-inflation component with a Beta-Binomial count model and subject-specific random effects. Estimation is carried out via the Stochastic Approximation EM (SAEM) algorithm with latent-variable augmentation, enabling likelihood-based inference despite an intractable likelihood. Through extensive simulations, ZIBBMR–SAEM demonstrates accurate parameter recovery and robust inference for variance components, particularly in small-sample settings, and outperforms competing methods in challenging scenarios; in a microbiome application, it provides complementary insights to a ZIBR benchmark while remaining more stable than some standard GLMM approaches. An accompanying R implementation facilitates fitting ZIBBMR and comparing it against ZIBR, highlighting the practical utility of jointly modeling counts and overdispersion in zero-inflated longitudinal data for biomedical research.
Abstract
Analyzing overdispersed, zero-inflated, longitudinal count data poses significant modeling and computational challenges, which standard count models (e.g., Poisson or negative binomial mixed effects models) fail to adequately address. We propose a Zero-Inflated Beta-Binomial Mixed Effects Regression (ZIBBMR) model that augments a beta-binomial count model with a zero-inflation component, fixed effects for covariates, and subject-specific random effects, accommodating excessive zeros, overdispersion, and within-subject correlation. Maximum likelihood estimation is performed via a Stochastic Approximation EM (SAEM) algorithm with latent variable augmentation, which circumvents the model's intractable likelihood and enables efficient computation. Simulation studies show that ZIBBMR achieves accuracy comparable to leading mixed-model approaches in the literature and surpasses simpler zero-inflated count formulations, particularly in small-sample scenarios. As a case study, we analyze longitudinal microbiome data, comparing ZIBBMR with an external Zero-Inflated Beta Regression (ZIBR) benchmark; the results indicate that applying both count- and proportion-based models in parallel can enhance inference robustness when both data types are available.
