Table of Contents
Fetching ...

A mixture model for subtype identification in the context of disease progression modeling

Sofia Kaisaridi, Juliette Ortholand, Caglayan Tuna, Hugues Chabriat, Sophie Tezenas du Montcel

TL;DR

A probabilistic mixture extension of a mixed effects model, the Disease Course Mapping model, to identify distinct disease progression subtypes within a population, enabling clustering based on both temporal and spatial variability in disease trajectories.

Abstract

The progression of chronic diseases often follows highly variable trajectories, and the underlying factors remain poorly understood. Standard mixed-effects models typically represent inter-patient differences as random deviations around a common reference, which may obscure meaningful subgroups. We propose a probabilistic mixture extension of a mixed effects model, the Disease Course Mapping model, to identify distinct disease progression subtypes within a population. The mixture structure is introduced at the latent individual parameters, enabling clustering based on both temporal and spatial variability in disease trajectories. We evaluated the model through simulation studies to assess classification performance and parameter recovery. Classification accuracy exceeded 90% in simpler scenarios and remained above 80% in the most complex case, with particularly high recall and precision for fast-progressing clusters. Compared to a post hoc classification approach, the proposed model yielded more accurate parameter estimates, smaller biases, lower root mean squared errors, and reduced uncertainty. It also correctly recovered the true three-cluster structure in 93% of the simulations. Finally, we applied the model to a longitudinal cohort of CADASIL patients, identifying two clinically meaningful clusters, differentiating patients with early versus late onset and fast versus slow progression, with clear spatial patterns across motor and memory scores. Overall, this probabilistic mixture framework offers a robust, interpretable approach for clustering patients based on spatiotemporal disease dynamics.

A mixture model for subtype identification in the context of disease progression modeling

TL;DR

A probabilistic mixture extension of a mixed effects model, the Disease Course Mapping model, to identify distinct disease progression subtypes within a population, enabling clustering based on both temporal and spatial variability in disease trajectories.

Abstract

The progression of chronic diseases often follows highly variable trajectories, and the underlying factors remain poorly understood. Standard mixed-effects models typically represent inter-patient differences as random deviations around a common reference, which may obscure meaningful subgroups. We propose a probabilistic mixture extension of a mixed effects model, the Disease Course Mapping model, to identify distinct disease progression subtypes within a population. The mixture structure is introduced at the latent individual parameters, enabling clustering based on both temporal and spatial variability in disease trajectories. We evaluated the model through simulation studies to assess classification performance and parameter recovery. Classification accuracy exceeded 90% in simpler scenarios and remained above 80% in the most complex case, with particularly high recall and precision for fast-progressing clusters. Compared to a post hoc classification approach, the proposed model yielded more accurate parameter estimates, smaller biases, lower root mean squared errors, and reduced uncertainty. It also correctly recovered the true three-cluster structure in 93% of the simulations. Finally, we applied the model to a longitudinal cohort of CADASIL patients, identifying two clinically meaningful clusters, differentiating patients with early versus late onset and fast versus slow progression, with clear spatial patterns across motor and memory scores. Overall, this probabilistic mixture framework offers a robust, interpretable approach for clustering patients based on spatiotemporal disease dynamics.
Paper Structure (25 sections, 3 equations, 2 figures)

This paper contains 25 sections, 3 equations, 2 figures.

Figures (2)

  • Figure 1: Confusion matrices illustrating the classification performance of the mixture model (left) and the post hoc classification (right) across the three simulated scenarios. Each cell indicates the proportion of individuals assigned to each predicted cluster relative to their true cluster membership. Diagonal elements represent the recall for each true cluster. Scenario_2_2 refers to the scenario with two scores and two clusters; Scenario_3_2 refers to the scenario with three scores and two clusters; Scenario_multi refers to the scenario with six scores and three clusters.
  • Figure 2: Modeled disease trajectories by cluster and overall average. Modeled trajectories of disease progression are shown for the two identified clusters and the overall average model. The solid lines represent the average model, the dashed lines correspond to Cluster 1, and the dotted lines to Cluster 2. The blue curves depict the evolution of the motor score, while the orange curves represent the memory score. The vertical solid black line indicates the time of the average disease onset.