Multimodal Neurodegenerative Disease Subtyping Explained by ChatGPT
Diego Machado Reyes, Hanqing Chao, Juergen Hahn, Li Shen, Pingkun Yan
TL;DR
The paper tackles early subtyping of Alzheimer's disease by integrating imaging, genetic, and clinical data through a novel tri-modal co-attention mechanism (Tri-COAT). It combines modality-specific encoders with a cross-modal attention module to produce discriminative representations for three subtypes defined by baseline MMSE trajectories, achieving a mean AUROC of approximately $0.733$ on ADNI data and outperforming baselines. A key innovation is the use of prompt-based explanations and integrated-gradients to generate interpretable narratives from large language models (ChatGPT and Bard), linking cross-modal biomarker associations to known biology while acknowledging LLM limitations. The work supports the potential for early, explainable, multimodal subtyping with implications for personalized intervention and paves the way for extensions to other neurodegenerative diseases and broader clinical workflows.
Abstract
Alzheimer's disease (AD) is the most prevalent neurodegenerative disease; yet its currently available treatments are limited to stopping disease progression. Moreover, effectiveness of these treatments is not guaranteed due to the heterogenetiy of the disease. Therefore, it is essential to be able to identify the disease subtypes at a very early stage. Current data driven approaches are able to classify the subtypes at later stages of AD or related disorders, but struggle when predicting at the asymptomatic or prodromal stage. Moreover, most existing models either lack explainability behind the classification or only use a single modality for the assessment, limiting scope of its analysis. Thus, we propose a multimodal framework that uses early-stage indicators such as imaging, genetics and clinical assessments to classify AD patients into subtypes at early stages. Similarly, we build prompts and use large language models, such as ChatGPT, to interpret the findings of our model. In our framework, we propose a tri-modal co-attention mechanism (Tri-COAT) to explicitly learn the cross-modal feature associations. Our proposed model outperforms baseline models and provides insight into key cross-modal feature associations supported by known biological mechanisms.
