MMGPL: Multimodal Medical Data Analysis with Graph Prompt Learning

Liang Peng; Songyue Cai; Zongqian Wu; Huifang Shang; Xiaofeng Zhu; Xiaoxiao Li

MMGPL: Multimodal Medical Data Analysis with Graph Prompt Learning

Liang Peng, Songyue Cai, Zongqian Wu, Huifang Shang, Xiaofeng Zhu, Xiaoxiao Li

TL;DR

MMGPL tackles neurological disorder diagnosis by integrating prompt learning with graph prompts to condition a pretrained multimodal encoder. It introduces a multimodal data tokenizer, GPT-4–generated disease concepts for semantic token weighting, and a graph convolutional network that encodes brain connectivity into prompts. The approach reduces noise from irrelevant patches and injects structural information through a concept-guided graph, achieving state-of-the-art or competitive results on ADNI and ABIDE with clear interpretability benefits. This work offers a scalable, flexible framework for multimodal medical data analysis that leverages large pretrained models for improved diagnostic performance and clinical insight.

Abstract

Prompt learning has demonstrated impressive efficacy in the fine-tuning of multimodal large models to a wide range of downstream tasks. Nonetheless, applying existing prompt learning methods for the diagnosis of neurological disorder still suffers from two issues: (i) existing methods typically treat all patches equally, despite the fact that only a small number of patches in neuroimaging are relevant to the disease, and (ii) they ignore the structural information inherent in the brain connection network which is crucial for understanding and diagnosing neurological disorders. To tackle these issues, we introduce a novel prompt learning model by learning graph prompts during the fine-tuning process of multimodal large models for diagnosing neurological disorders. Specifically, we first leverage GPT-4 to obtain relevant disease concepts and compute semantic similarity between these concepts and all patches. Secondly, we reduce the weight of irrelevant patches according to the semantic similarity between each patch and disease-related concepts. Moreover, we construct a graph among tokens based on these concepts and employ a graph convolutional network layer to extract the structural information of the graph, which is used to prompt the pre-trained multimodal large models for diagnosing neurological disorders. Extensive experiments demonstrate that our method achieves superior performance for neurological disorder diagnosis compared with state-of-the-art methods and validated by clinicians.

MMGPL: Multimodal Medical Data Analysis with Graph Prompt Learning

TL;DR

Abstract

Paper Structure (26 sections, 14 equations, 6 figures, 2 tables)

This paper contains 26 sections, 14 equations, 6 figures, 2 tables.

Introduction
Related works
Multimodal large models
Prompt learning
Graph neural network
Methods
Preliminary and motivations
Multimodal data tokenizer
Patch partitioning
Tokenization
Concept learning
Concept generation
Semantic similarity computation
Graph prompt learning
Graph construction
...and 11 more sections

Figures (6)

Figure 1: The flowchart of the proposed MMGPL consists of three modules i.e., multimodal data tokenizer (light blue block), concept learning (light green block), and graph prompt learning (light yellow block). First, MMGPL divides the multimodal medical data into multiple patches and project them into a shared embedding space (Sec. \ref{['sec_tokenizer']}). Second, MMGPL prompts the GPT-4 to generate disease-related concepts and further learn the weights of tokens based on the semantic similarity between tokens and concepts (Sec. \ref{['sec_Concept']}). Third, MMGPL learns a graph among tokens and extracts structural information to prompt the unified encoder (Sec. \ref{['sec_Graph']}). Finally, MMGPL obtains the output from the unified encoder and uses it to predict the label of the subject.
Figure 2: Performance of MMGPL with different combination of components on all datasets, i.e., "B" denotes baseline method, "B+G" denotes baseline method with graph prompt learning, "B+C" denotes baseline method with concept learning, and "B+C+G" denotes baseline method with graph prompt learning and concept learning.
Figure 3: Performance of MMGPL with different modalities.
Figure 4: Heat maps generated by MMGPL on different subjects in ADNI dataset.
Figure 5: The visualization of concept-similarity graph on the ADNI dataset. The horizontal and vertical axes represent concepts and tokens. Different colors represent concepts belonging to different categories. The red texts represent concepts related to NC, the green texts represent concepts related to LMCI, and the blue texts represent concepts related to AD.
...and 1 more figures

MMGPL: Multimodal Medical Data Analysis with Graph Prompt Learning

TL;DR

Abstract

MMGPL: Multimodal Medical Data Analysis with Graph Prompt Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (6)