Table of Contents
Fetching ...

MM-GTUNets: Unified Multi-Modal Graph Deep Learning for Brain Disorders Prediction

Luhui Cai, Weiming Zeng, Hongyu Chen, Hua Zhang, Yueyang Li, Yu Feng, Hongjie Yan, Lingbin Bian, Wai Ting Siok, Nizhuan Wang

TL;DR

MM-GTUNets tackles large-scale brain disorder prediction from multi-modal data by introducing MRRL to adaptively construct population graphs and ACMGL to fuse imaging and non-imaging information within a Graph TransUNet encoder. The framework yields a modality-joint representation with interpretable contribution weights and demonstrates superior performance on ABIDE (HC vs. ASD) and ADHD-200 across multiple metrics, supported by ablations and robustness analyses. Key contributions include the adaptive reward-based population graph construction, the GTUNet-based cross-modal encoder with a multi-modal attention fusion module, and visualization of modality contributions for clinical interpretability. This approach offers scalable, interpretable multi-modal BD prediction with potential to support real-world clinical decision-making and multi-site research.

Abstract

Graph deep learning (GDL) has demonstrated impressive performance in predicting population-based brain disorders (BDs) through the integration of both imaging and non-imaging data. However, the effectiveness of GDL based methods heavily depends on the quality of modeling the multi-modal population graphs and tends to degrade as the graph scale increases. Furthermore, these methods often constrain interactions between imaging and non-imaging data to node-edge interactions within the graph, overlooking complex inter-modal correlations, leading to suboptimal outcomes. To overcome these challenges, we propose MM-GTUNets, an end-to-end graph transformer based multi-modal graph deep learning (MMGDL) framework designed for brain disorders prediction at large scale. Specifically, to effectively leverage rich multi-modal information related to diseases, we introduce Modality Reward Representation Learning (MRRL) which adaptively constructs population graphs using a reward system. Additionally, we employ variational autoencoder to reconstruct latent representations of non-imaging features aligned with imaging features. Based on this, we propose Adaptive Cross-Modal Graph Learning (ACMGL), which captures critical modality-specific and modality-shared features through a unified GTUNet encoder taking advantages of Graph UNet and Graph Transformer, and feature fusion module. We validated our method on two public multi-modal datasets ABIDE and ADHD-200, demonstrating its superior performance in diagnosing BDs. Our code is available at https://github.com/NZWANG/MM-GTUNets.

MM-GTUNets: Unified Multi-Modal Graph Deep Learning for Brain Disorders Prediction

TL;DR

MM-GTUNets tackles large-scale brain disorder prediction from multi-modal data by introducing MRRL to adaptively construct population graphs and ACMGL to fuse imaging and non-imaging information within a Graph TransUNet encoder. The framework yields a modality-joint representation with interpretable contribution weights and demonstrates superior performance on ABIDE (HC vs. ASD) and ADHD-200 across multiple metrics, supported by ablations and robustness analyses. Key contributions include the adaptive reward-based population graph construction, the GTUNet-based cross-modal encoder with a multi-modal attention fusion module, and visualization of modality contributions for clinical interpretability. This approach offers scalable, interpretable multi-modal BD prediction with potential to support real-world clinical decision-making and multi-site research.

Abstract

Graph deep learning (GDL) has demonstrated impressive performance in predicting population-based brain disorders (BDs) through the integration of both imaging and non-imaging data. However, the effectiveness of GDL based methods heavily depends on the quality of modeling the multi-modal population graphs and tends to degrade as the graph scale increases. Furthermore, these methods often constrain interactions between imaging and non-imaging data to node-edge interactions within the graph, overlooking complex inter-modal correlations, leading to suboptimal outcomes. To overcome these challenges, we propose MM-GTUNets, an end-to-end graph transformer based multi-modal graph deep learning (MMGDL) framework designed for brain disorders prediction at large scale. Specifically, to effectively leverage rich multi-modal information related to diseases, we introduce Modality Reward Representation Learning (MRRL) which adaptively constructs population graphs using a reward system. Additionally, we employ variational autoencoder to reconstruct latent representations of non-imaging features aligned with imaging features. Based on this, we propose Adaptive Cross-Modal Graph Learning (ACMGL), which captures critical modality-specific and modality-shared features through a unified GTUNet encoder taking advantages of Graph UNet and Graph Transformer, and feature fusion module. We validated our method on two public multi-modal datasets ABIDE and ADHD-200, demonstrating its superior performance in diagnosing BDs. Our code is available at https://github.com/NZWANG/MM-GTUNets.
Paper Structure (48 sections, 18 equations, 9 figures, 8 tables)

This paper contains 48 sections, 18 equations, 9 figures, 8 tables.

Figures (9)

  • Figure 1: The proposed MM-GTUNets framework for BDs prediction. In this paper, imaging data and non-imaging data refer to rs-fMRI and clinical data respectively.
  • Figure 2: Affinity Metric Reward System. AMRS adaptively adjusts the contribution weights of each type of non-imaging data and generates the non-imaging affinity graph, making the overall framework's diagnostic process more intelligent.
  • Figure 3: Graph Transformer (GT)ying2021transformersshi2021masked architecture: The GT layer improves global context capture through self-attention, utilizes multi-head attention for diverse feature learning, and prevents over-smoothing with residual connections and layer normalization, enabling deeper architectures and enhanced performance.
  • Figure 4: Visualization of the joint representation of modalities.
  • Figure 5: Accuracy of MM-GTUNets with different embedding dimensions and different pooling ratios.
  • ...and 4 more figures