Table of Contents
Fetching ...

Parkinson's Disease Classification Using Contrastive Graph Cross-View Learning with Multimodal Fusion of SPECT Images and Clinical Features

Jun-En Ding, Chien-Chin Hsu, Feng Liu

TL;DR

Parkinson's disease classification is challenged by relying on either imaging data or clinical data in isolation. The authors propose a multimodal framework that builds two graphs—from SPECT image embeddings and from clinical features—and learns a shared representation through a contrastive cross-view loss with a co-attention module. A dual-graph view GAT architecture fuses modalities by combining fused embeddings from image and non-image graphs, improving robustness and interpretability. On a hospital-based dataset with 12 DaTQUANT features and TRODAT SPECT images, the method achieves an average accuracy of 0.91 and an AUC of 0.93 in five-fold cross-validation, outperforming image-only and other baselines. This work demonstrates the value of leveraging manifold structure and multimodal information for more reliable PD diagnosis.

Abstract

Parkinson's Disease (PD) affects millions globally, impacting movement. Prior research utilized deep learning for PD prediction, primarily focusing on medical images, neglecting the data's underlying manifold structure. This work proposes a multimodal approach encompassing both image and non-image features, leveraging contrastive cross-view graph fusion for PD classification. We introduce a novel multimodal co-attention module, integrating embeddings from separate graph views derived from low-dimensional representations of images and clinical features. This enables more robust and structured feature extraction for improved multi-view data analysis. Additionally, a simplified contrastive loss-based fusion method is devised to enhance cross-view fusion learning. Our graph-view multimodal approach achieves an accuracy of 0.91 and an area under the receiver operating characteristic curve (AUC) of 0.93 in five-fold cross-validation. It also demonstrates superior predictive capabilities on non-image data compared to solely machine learning-based methods.

Parkinson's Disease Classification Using Contrastive Graph Cross-View Learning with Multimodal Fusion of SPECT Images and Clinical Features

TL;DR

Parkinson's disease classification is challenged by relying on either imaging data or clinical data in isolation. The authors propose a multimodal framework that builds two graphs—from SPECT image embeddings and from clinical features—and learns a shared representation through a contrastive cross-view loss with a co-attention module. A dual-graph view GAT architecture fuses modalities by combining fused embeddings from image and non-image graphs, improving robustness and interpretability. On a hospital-based dataset with 12 DaTQUANT features and TRODAT SPECT images, the method achieves an average accuracy of 0.91 and an AUC of 0.93 in five-fold cross-validation, outperforming image-only and other baselines. This work demonstrates the value of leveraging manifold structure and multimodal information for more reliable PD diagnosis.

Abstract

Parkinson's Disease (PD) affects millions globally, impacting movement. Prior research utilized deep learning for PD prediction, primarily focusing on medical images, neglecting the data's underlying manifold structure. This work proposes a multimodal approach encompassing both image and non-image features, leveraging contrastive cross-view graph fusion for PD classification. We introduce a novel multimodal co-attention module, integrating embeddings from separate graph views derived from low-dimensional representations of images and clinical features. This enables more robust and structured feature extraction for improved multi-view data analysis. Additionally, a simplified contrastive loss-based fusion method is devised to enhance cross-view fusion learning. Our graph-view multimodal approach achieves an accuracy of 0.91 and an area under the receiver operating characteristic curve (AUC) of 0.93 in five-fold cross-validation. It also demonstrates superior predictive capabilities on non-image data compared to solely machine learning-based methods.
Paper Structure (15 sections, 11 equations, 3 figures, 2 tables)

This paper contains 15 sections, 11 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: The workflow of multimodal contrastive cross-view graph learning framework.
  • Figure 2: Visualization of the predictions of the two-graph cross-view GAT model, incorporating three variables from twelve parameters. Figure (A) Scatter plots of three parameters derived from DaTQUANT to explore the data distribution for normal versus abnormal TRODAT SPECT images. Figure (B) Five-fold cross-validation of ROC curves for each testing set.
  • Figure 3: The mean and standard deviation performance of our proposed model in terms of sensitivity and specificity across five runs on testing data, based on a varying number of K-neighborhoods.