Table of Contents
Fetching ...

Addressing outliers in mixed-effects logistic regression: a more robust modeling approach

Divan A. Burger, Sean van der Merwe, Emmanuel Lesaffre

TL;DR

The paper develops a robust Bayesian approach for hierarchical bounded-count data by introducing a binomial-logit-t model with a $t$-distributed latent variable acting as an observation-level random effect. This framework addresses both overdispersion and outliers, yielding a closed-form pseudo-median for interpretation and accommodating mean predictions via numerical integration. Applied to longitudinal medication adherence data and supported by simulation studies, the binomial-logit-t model demonstrates superior fit and resilience to outliers compared with standard binomial, beta-binomial, and binomial-logit-normal models. The work provides practical guidance for analyzing bounded counts in health data and contributes to the broader toolbox for robust mixed-effects modeling in the presence of anomalies.

Abstract

This study introduces an outlier-robust model for analyzing hierarchically structured bounded count data within a Bayesian framework, utilizing a logistic regression approach implemented in JAGS. Our model incorporates a t-distributed latent variable to address overdispersion and outliers, improving robustness compared to conventional models such as the beta-binomial, binomial-logit-normal, and standard binomial models. Notably, our model targets a pseudo-median that differs from the true discrete median by less than one count; this closed-form quantity provides a robust and interpretable measure of central tendency. For comparability between all models, we additionally make predictions based on the mean proportion; however, this involves an integration step for the t-distributed nuisance parameter. While limited literature specifically addresses outliers in mixed models for bounded count data, this research fills that gap. The practical utility of the model is demonstrated using a longitudinal medication adherence dataset, where patient behavior often results in abrupt changes and outliers within individual trajectories. A simulation study demonstrates the binomial-logit-t model's strong performance, with comparison statistics favoring it among the four evaluated models. An additional data contamination simulation confirms its robustness against outliers. Our robust approach maintains the integrity of the dataset, effectively handling outliers to provide more accurate and reliable parameter estimates.

Addressing outliers in mixed-effects logistic regression: a more robust modeling approach

TL;DR

The paper develops a robust Bayesian approach for hierarchical bounded-count data by introducing a binomial-logit-t model with a -distributed latent variable acting as an observation-level random effect. This framework addresses both overdispersion and outliers, yielding a closed-form pseudo-median for interpretation and accommodating mean predictions via numerical integration. Applied to longitudinal medication adherence data and supported by simulation studies, the binomial-logit-t model demonstrates superior fit and resilience to outliers compared with standard binomial, beta-binomial, and binomial-logit-normal models. The work provides practical guidance for analyzing bounded counts in health data and contributes to the broader toolbox for robust mixed-effects modeling in the presence of anomalies.

Abstract

This study introduces an outlier-robust model for analyzing hierarchically structured bounded count data within a Bayesian framework, utilizing a logistic regression approach implemented in JAGS. Our model incorporates a t-distributed latent variable to address overdispersion and outliers, improving robustness compared to conventional models such as the beta-binomial, binomial-logit-normal, and standard binomial models. Notably, our model targets a pseudo-median that differs from the true discrete median by less than one count; this closed-form quantity provides a robust and interpretable measure of central tendency. For comparability between all models, we additionally make predictions based on the mean proportion; however, this involves an integration step for the t-distributed nuisance parameter. While limited literature specifically addresses outliers in mixed models for bounded count data, this research fills that gap. The practical utility of the model is demonstrated using a longitudinal medication adherence dataset, where patient behavior often results in abrupt changes and outliers within individual trajectories. A simulation study demonstrates the binomial-logit-t model's strong performance, with comparison statistics favoring it among the four evaluated models. An additional data contamination simulation confirms its robustness against outliers. Our robust approach maintains the integrity of the dataset, effectively handling outliers to provide more accurate and reliable parameter estimates.

Paper Structure

This paper contains 26 sections, 34 equations, 11 figures, 4 tables.

Figures (11)

  • Figure 1: Adherence profiles of 8 patients completing atorvastatin treatment. Black dots indicate the proportion of days the medication was taken as prescribed each month. Blue dashed lines connect these dots, tracing adherence over time and showing strong compliance with the medication schedule.
  • Figure 1: Adherence proportion estimates under the binomial, beta-binomial, binomial-logit-normal, and binomial-logit-$t$ models. The left panel shows patients in the non-intervention group, and the right panel shows those in the intervention group. Green squares with long-dashed lines represent the binomial model, red circles with dashed lines represent the beta-binomial model, black diamonds with dotted lines represent the binomial-logit-normal model, and blue triangles with solid lines represent the binomial-logit-$t$ model. The adherence proportion estimates from the four models are similar across both groups and all time points.
  • Figure 2: Adherence profiles of 8 patients who discontinued atorvastatin treatment. Black dots represent the proportion of days the medication was taken as prescribed each month. Blue dashed lines connect these dots, illustrating adherence trajectories over time, with sharp declines or significant variability before discontinuation.
  • Figure 2: Residual diagnostics across four different models: binomial-logit-$t$, binomial-logit-normal, beta-binomial, and binomial. The QQ plots display the uniformity of the scaled residuals; a closer alignment to the 45-degree red line indicates a better fit. The binomial-logit-$t$ model shows the best fit, with residuals closely aligned across the expected range. The binomial model exhibits the most significant deviations, indicating potential issues with the model's fit and overdispersion. The beta-binomial and binomial-logit-normal models show intermediate performance, with slight deviations.
  • Figure 3: Probability mass function of the binomial-logit-$t$ and binomial distributions for varying $\eta$ (location) and $\sigma$ (scale). Black symbols/lines indicate the binomial distribution, which does not account for overdispersion. Blue symbols/lines ($\nu = 100$) and red symbols/lines ($\nu = 2.5$) represent the binomial-logit-$t$ distribution, both illustrating overdispersion with the same location and scale. The red lines show heavier tails due to the lower degrees of freedom, reflecting the impact of outliers. Increases in $\sigma$ from top to bottom panels result in wider probability spreads, and the rightward shift in $\eta$ across columns shifts the peak towards higher successes.
  • ...and 6 more figures