Enhancing Multimodal Medical Image Classification using Cross-Graph Modal Contrastive Learning
Jun-En Ding, Chien-Chin Hsu, Chi-Hsiang Chu, Shuqiang Wang, Feng Liu
TL;DR
CGMCL presents a dual-graph cross-modal framework that aligns image and non-image medical data in a shared latent space using a graph attention encoder and a cross-graph contrastive loss. By constructing separate modality graphs for imaging and clinical features and introducing an IMFES module, CGMCL balances heterogeneous data distributions while preserving structural relationships. The method demonstrates improved accuracy, interpretability, and robustness on Parkinson’s disease SPECT data and a melanoma multimodal dataset, offering clearer Grad-CAM localization and actionable meta-feature insights. Overall, CGMCL advances multimodal medical classification by enabling cross-modal alignment, structured fusion, and clinically meaningful interpretation with scalable efficiency.
Abstract
The classification of medical images is a pivotal aspect of disease diagnosis, often enhanced by deep learning techniques. However, traditional approaches typically focus on unimodal medical image data, neglecting the integration of diverse non-image patient data. This paper proposes a novel Cross-Graph Modal Contrastive Learning (CGMCL) framework for multimodal structured data from different data domains to improve medical image classification. The model effectively integrates both image and non-image data by constructing cross-modality graphs and leveraging contrastive learning to align multimodal features in a shared latent space. An inter-modality feature scaling module further optimizes the representation learning process by reducing the gap between heterogeneous modalities. The proposed approach is evaluated on two datasets: a Parkinson's disease (PD) dataset and a public melanoma dataset. Results demonstrate that CGMCL outperforms conventional unimodal methods in accuracy, interpretability, and early disease prediction. Additionally, the method shows superior performance in multi-class melanoma classification. The CGMCL framework provides valuable insights into medical image classification while offering improved disease interpretability and predictive capabilities.
