Table of Contents
Fetching ...

Modeling Missing at Random Neuropsychological Test Scores Using a Mixture of Binomial Product Experts

Daniel Suen, Yen-Chi Chen

TL;DR

The paper develops a principled mixture of binomial product experts to model multivariate, bounded discrete neuropsychological test scores conditional on baseline covariates, and extends it to MAR missing data via a nested EM with Monte Carlo imputation. By integrating covariate-dependent component weights with fixed-component test-score distributions, the approach yields latent cognitive-ability subgroups and enables simultaneous imputation and clustering. Theoretical results establish generic identifiability under mild conditions, and inference is supported by bootstrap-based confidence intervals and extensive simulations demonstrating consistency and reasonable coverage. Application to the NACC dataset shows meaningful latent strata aligned with cognitive status, with MAR handling significantly affecting clustering and interpretation, underscoring the importance of principled missing-data treatment in dementia research. The framework offers a scalable, interpretable pathway for jointly modeling and imputing multiple discrete outcomes in biomedical settings with covariates and MAR missingness.

Abstract

Multivariate bounded discrete data arises in many fields. In the setting of dementia studies, such data is collected when individuals complete neuropsychological tests. We outline a modeling and inference procedure that can model the joint distribution conditional on baseline covariates, leveraging previous work on mixtures of experts and latent class models. Furthermore, we illustrate how the work can be extended when the outcome data is missing at random using a nested EM algorithm. The proposed model can incorporate covariate information and perform imputation and clustering. We apply our model on simulated data and an Alzheimer's disease data set.

Modeling Missing at Random Neuropsychological Test Scores Using a Mixture of Binomial Product Experts

TL;DR

The paper develops a principled mixture of binomial product experts to model multivariate, bounded discrete neuropsychological test scores conditional on baseline covariates, and extends it to MAR missing data via a nested EM with Monte Carlo imputation. By integrating covariate-dependent component weights with fixed-component test-score distributions, the approach yields latent cognitive-ability subgroups and enables simultaneous imputation and clustering. Theoretical results establish generic identifiability under mild conditions, and inference is supported by bootstrap-based confidence intervals and extensive simulations demonstrating consistency and reasonable coverage. Application to the NACC dataset shows meaningful latent strata aligned with cognitive status, with MAR handling significantly affecting clustering and interpretation, underscoring the importance of principled missing-data treatment in dementia research. The framework offers a scalable, interpretable pathway for jointly modeling and imputing multiple discrete outcomes in biomedical settings with covariates and MAR missingness.

Abstract

Multivariate bounded discrete data arises in many fields. In the setting of dementia studies, such data is collected when individuals complete neuropsychological tests. We outline a modeling and inference procedure that can model the joint distribution conditional on baseline covariates, leveraging previous work on mixtures of experts and latent class models. Furthermore, we illustrate how the work can be extended when the outcome data is missing at random using a nested EM algorithm. The proposed model can incorporate covariate information and perform imputation and clustering. We apply our model on simulated data and an Alzheimer's disease data set.
Paper Structure (44 sections, 8 theorems, 63 equations, 6 figures, 10 tables, 4 algorithms)

This paper contains 44 sections, 8 theorems, 63 equations, 6 figures, 10 tables, 4 algorithms.

Key Result

Proposition 1

Suppose the following conditions hold. Then, the mixture of binomial product experts is generically identifiable up to permutation of the parameters.

Figures (6)

  • Figure 1: This flowchart describes overall inference procedure.
  • Figure 2: The CDR score distributions for the complete cases, the individuals missing at least one outcome variable, and the entire data set are provided in the left, middle, and right panels, respectively.
  • Figure 3: Clustering on complete data only. These barplots summarize the composition of each of the five latent groups. We order the groups from most healthy to least healthy. This is reflected in the mean CDR score of each group.
  • Figure 4: Clustering on the entire data with MAR assumption. These barplots summarize the composition of each of the five latent groups. We order the groups from most healthy to least healthy. This is reflected in the mean CDR score of each group.
  • Figure 5: This figure depicts the AIC and BIC curves as we vary $K$ for a given simulated data set of size $n=500$ and $\eta=\infty,2$.
  • ...and 1 more figures

Theorems & Definitions (16)

  • Proposition 1: Sufficient conditions for generic identifiability
  • Definition 1: Missing at random
  • Theorem 1: Asymptotic distribution of the MLE
  • Proposition 1: Sufficient conditions for generic identifiability
  • Remark 1
  • Remark 2
  • Remark 3
  • Lemma 1: Nonconcavity of the LI log-likelihood function
  • proof : Proof of Lemma \ref{['lemma:nonconcavity']}
  • Theorem 2: Theorem 4 of Allman2009
  • ...and 6 more