Modeling Missing at Random Neuropsychological Test Scores Using a Mixture of Binomial Product Experts

Daniel Suen; Yen-Chi Chen

Modeling Missing at Random Neuropsychological Test Scores Using a Mixture of Binomial Product Experts

Daniel Suen, Yen-Chi Chen

TL;DR

The paper develops a principled mixture of binomial product experts to model multivariate, bounded discrete neuropsychological test scores conditional on baseline covariates, and extends it to MAR missing data via a nested EM with Monte Carlo imputation. By integrating covariate-dependent component weights with fixed-component test-score distributions, the approach yields latent cognitive-ability subgroups and enables simultaneous imputation and clustering. Theoretical results establish generic identifiability under mild conditions, and inference is supported by bootstrap-based confidence intervals and extensive simulations demonstrating consistency and reasonable coverage. Application to the NACC dataset shows meaningful latent strata aligned with cognitive status, with MAR handling significantly affecting clustering and interpretation, underscoring the importance of principled missing-data treatment in dementia research. The framework offers a scalable, interpretable pathway for jointly modeling and imputing multiple discrete outcomes in biomedical settings with covariates and MAR missingness.

Abstract

Multivariate bounded discrete data arises in many fields. In the setting of dementia studies, such data is collected when individuals complete neuropsychological tests. We outline a modeling and inference procedure that can model the joint distribution conditional on baseline covariates, leveraging previous work on mixtures of experts and latent class models. Furthermore, we illustrate how the work can be extended when the outcome data is missing at random using a nested EM algorithm. The proposed model can incorporate covariate information and perform imputation and clustering. We apply our model on simulated data and an Alzheimer's disease data set.

Modeling Missing at Random Neuropsychological Test Scores Using a Mixture of Binomial Product Experts

TL;DR

Abstract

Paper Structure (44 sections, 8 theorems, 63 equations, 6 figures, 10 tables, 4 algorithms)

This paper contains 44 sections, 8 theorems, 63 equations, 6 figures, 10 tables, 4 algorithms.

Introduction
The National Alzheimer's Coordinating Center database
Research questions
Literature review
Dementia-related research
Methodology research
Outline
Mixture of Binomial Product Experts
A latent class model for neuropsychological test scores
Model fitting
Identifiability
Missingness in the Outcome Variables
Missing at random and an imputation strategy
Model fitting under a missing at random assumption
Inference
...and 29 more sections

Key Result

Proposition 1

Suppose the following conditions hold. Then, the mixture of binomial product experts is generically identifiable up to permutation of the parameters.

Figures (6)

Figure 1: This flowchart describes overall inference procedure.
Figure 2: The CDR score distributions for the complete cases, the individuals missing at least one outcome variable, and the entire data set are provided in the left, middle, and right panels, respectively.
Figure 3: Clustering on complete data only. These barplots summarize the composition of each of the five latent groups. We order the groups from most healthy to least healthy. This is reflected in the mean CDR score of each group.
Figure 4: Clustering on the entire data with MAR assumption. These barplots summarize the composition of each of the five latent groups. We order the groups from most healthy to least healthy. This is reflected in the mean CDR score of each group.
Figure 5: This figure depicts the AIC and BIC curves as we vary $K$ for a given simulated data set of size $n=500$ and $\eta=\infty,2$.
...and 1 more figures

Theorems & Definitions (16)

Proposition 1: Sufficient conditions for generic identifiability
Definition 1: Missing at random
Theorem 1: Asymptotic distribution of the MLE
Proposition 1: Sufficient conditions for generic identifiability
Remark 1
Remark 2
Remark 3
Lemma 1: Nonconcavity of the LI log-likelihood function
proof : Proof of Lemma \ref{['lemma:nonconcavity']}
Theorem 2: Theorem 4 of Allman2009
...and 6 more

Modeling Missing at Random Neuropsychological Test Scores Using a Mixture of Binomial Product Experts

TL;DR

Abstract

Modeling Missing at Random Neuropsychological Test Scores Using a Mixture of Binomial Product Experts

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (16)