Parkinson's Disease Classification Using Contrastive Graph Cross-View Learning with Multimodal Fusion of SPECT Images and Clinical Features
Jun-En Ding, Chien-Chin Hsu, Feng Liu
TL;DR
Parkinson's disease classification is challenged by relying on either imaging data or clinical data in isolation. The authors propose a multimodal framework that builds two graphs—from SPECT image embeddings and from clinical features—and learns a shared representation through a contrastive cross-view loss with a co-attention module. A dual-graph view GAT architecture fuses modalities by combining fused embeddings from image and non-image graphs, improving robustness and interpretability. On a hospital-based dataset with 12 DaTQUANT features and TRODAT SPECT images, the method achieves an average accuracy of 0.91 and an AUC of 0.93 in five-fold cross-validation, outperforming image-only and other baselines. This work demonstrates the value of leveraging manifold structure and multimodal information for more reliable PD diagnosis.
Abstract
Parkinson's Disease (PD) affects millions globally, impacting movement. Prior research utilized deep learning for PD prediction, primarily focusing on medical images, neglecting the data's underlying manifold structure. This work proposes a multimodal approach encompassing both image and non-image features, leveraging contrastive cross-view graph fusion for PD classification. We introduce a novel multimodal co-attention module, integrating embeddings from separate graph views derived from low-dimensional representations of images and clinical features. This enables more robust and structured feature extraction for improved multi-view data analysis. Additionally, a simplified contrastive loss-based fusion method is devised to enhance cross-view fusion learning. Our graph-view multimodal approach achieves an accuracy of 0.91 and an area under the receiver operating characteristic curve (AUC) of 0.93 in five-fold cross-validation. It also demonstrates superior predictive capabilities on non-image data compared to solely machine learning-based methods.
