Modeling Missing at Random Neuropsychological Test Scores Using a Mixture of Binomial Product Experts
Daniel Suen, Yen-Chi Chen
TL;DR
The paper develops a principled mixture of binomial product experts to model multivariate, bounded discrete neuropsychological test scores conditional on baseline covariates, and extends it to MAR missing data via a nested EM with Monte Carlo imputation. By integrating covariate-dependent component weights with fixed-component test-score distributions, the approach yields latent cognitive-ability subgroups and enables simultaneous imputation and clustering. Theoretical results establish generic identifiability under mild conditions, and inference is supported by bootstrap-based confidence intervals and extensive simulations demonstrating consistency and reasonable coverage. Application to the NACC dataset shows meaningful latent strata aligned with cognitive status, with MAR handling significantly affecting clustering and interpretation, underscoring the importance of principled missing-data treatment in dementia research. The framework offers a scalable, interpretable pathway for jointly modeling and imputing multiple discrete outcomes in biomedical settings with covariates and MAR missingness.
Abstract
Multivariate bounded discrete data arises in many fields. In the setting of dementia studies, such data is collected when individuals complete neuropsychological tests. We outline a modeling and inference procedure that can model the joint distribution conditional on baseline covariates, leveraging previous work on mixtures of experts and latent class models. Furthermore, we illustrate how the work can be extended when the outcome data is missing at random using a nested EM algorithm. The proposed model can incorporate covariate information and perform imputation and clustering. We apply our model on simulated data and an Alzheimer's disease data set.
