Addressing outliers in mixed-effects logistic regression: a more robust modeling approach
Divan A. Burger, Sean van der Merwe, Emmanuel Lesaffre
TL;DR
The paper develops a robust Bayesian approach for hierarchical bounded-count data by introducing a binomial-logit-t model with a $t$-distributed latent variable acting as an observation-level random effect. This framework addresses both overdispersion and outliers, yielding a closed-form pseudo-median for interpretation and accommodating mean predictions via numerical integration. Applied to longitudinal medication adherence data and supported by simulation studies, the binomial-logit-t model demonstrates superior fit and resilience to outliers compared with standard binomial, beta-binomial, and binomial-logit-normal models. The work provides practical guidance for analyzing bounded counts in health data and contributes to the broader toolbox for robust mixed-effects modeling in the presence of anomalies.
Abstract
This study introduces an outlier-robust model for analyzing hierarchically structured bounded count data within a Bayesian framework, utilizing a logistic regression approach implemented in JAGS. Our model incorporates a t-distributed latent variable to address overdispersion and outliers, improving robustness compared to conventional models such as the beta-binomial, binomial-logit-normal, and standard binomial models. Notably, our model targets a pseudo-median that differs from the true discrete median by less than one count; this closed-form quantity provides a robust and interpretable measure of central tendency. For comparability between all models, we additionally make predictions based on the mean proportion; however, this involves an integration step for the t-distributed nuisance parameter. While limited literature specifically addresses outliers in mixed models for bounded count data, this research fills that gap. The practical utility of the model is demonstrated using a longitudinal medication adherence dataset, where patient behavior often results in abrupt changes and outliers within individual trajectories. A simulation study demonstrates the binomial-logit-t model's strong performance, with comparison statistics favoring it among the four evaluated models. An additional data contamination simulation confirms its robustness against outliers. Our robust approach maintains the integrity of the dataset, effectively handling outliers to provide more accurate and reliable parameter estimates.
