MMGPL: Multimodal Medical Data Analysis with Graph Prompt Learning
Liang Peng, Songyue Cai, Zongqian Wu, Huifang Shang, Xiaofeng Zhu, Xiaoxiao Li
TL;DR
MMGPL tackles neurological disorder diagnosis by integrating prompt learning with graph prompts to condition a pretrained multimodal encoder. It introduces a multimodal data tokenizer, GPT-4–generated disease concepts for semantic token weighting, and a graph convolutional network that encodes brain connectivity into prompts. The approach reduces noise from irrelevant patches and injects structural information through a concept-guided graph, achieving state-of-the-art or competitive results on ADNI and ABIDE with clear interpretability benefits. This work offers a scalable, flexible framework for multimodal medical data analysis that leverages large pretrained models for improved diagnostic performance and clinical insight.
Abstract
Prompt learning has demonstrated impressive efficacy in the fine-tuning of multimodal large models to a wide range of downstream tasks. Nonetheless, applying existing prompt learning methods for the diagnosis of neurological disorder still suffers from two issues: (i) existing methods typically treat all patches equally, despite the fact that only a small number of patches in neuroimaging are relevant to the disease, and (ii) they ignore the structural information inherent in the brain connection network which is crucial for understanding and diagnosing neurological disorders. To tackle these issues, we introduce a novel prompt learning model by learning graph prompts during the fine-tuning process of multimodal large models for diagnosing neurological disorders. Specifically, we first leverage GPT-4 to obtain relevant disease concepts and compute semantic similarity between these concepts and all patches. Secondly, we reduce the weight of irrelevant patches according to the semantic similarity between each patch and disease-related concepts. Moreover, we construct a graph among tokens based on these concepts and employ a graph convolutional network layer to extract the structural information of the graph, which is used to prompt the pre-trained multimodal large models for diagnosing neurological disorders. Extensive experiments demonstrate that our method achieves superior performance for neurological disorder diagnosis compared with state-of-the-art methods and validated by clinicians.
