MM-GTUNets: Unified Multi-Modal Graph Deep Learning for Brain Disorders Prediction
Luhui Cai, Weiming Zeng, Hongyu Chen, Hua Zhang, Yueyang Li, Yu Feng, Hongjie Yan, Lingbin Bian, Wai Ting Siok, Nizhuan Wang
TL;DR
MM-GTUNets tackles large-scale brain disorder prediction from multi-modal data by introducing MRRL to adaptively construct population graphs and ACMGL to fuse imaging and non-imaging information within a Graph TransUNet encoder. The framework yields a modality-joint representation with interpretable contribution weights and demonstrates superior performance on ABIDE (HC vs. ASD) and ADHD-200 across multiple metrics, supported by ablations and robustness analyses. Key contributions include the adaptive reward-based population graph construction, the GTUNet-based cross-modal encoder with a multi-modal attention fusion module, and visualization of modality contributions for clinical interpretability. This approach offers scalable, interpretable multi-modal BD prediction with potential to support real-world clinical decision-making and multi-site research.
Abstract
Graph deep learning (GDL) has demonstrated impressive performance in predicting population-based brain disorders (BDs) through the integration of both imaging and non-imaging data. However, the effectiveness of GDL based methods heavily depends on the quality of modeling the multi-modal population graphs and tends to degrade as the graph scale increases. Furthermore, these methods often constrain interactions between imaging and non-imaging data to node-edge interactions within the graph, overlooking complex inter-modal correlations, leading to suboptimal outcomes. To overcome these challenges, we propose MM-GTUNets, an end-to-end graph transformer based multi-modal graph deep learning (MMGDL) framework designed for brain disorders prediction at large scale. Specifically, to effectively leverage rich multi-modal information related to diseases, we introduce Modality Reward Representation Learning (MRRL) which adaptively constructs population graphs using a reward system. Additionally, we employ variational autoencoder to reconstruct latent representations of non-imaging features aligned with imaging features. Based on this, we propose Adaptive Cross-Modal Graph Learning (ACMGL), which captures critical modality-specific and modality-shared features through a unified GTUNet encoder taking advantages of Graph UNet and Graph Transformer, and feature fusion module. We validated our method on two public multi-modal datasets ABIDE and ADHD-200, demonstrating its superior performance in diagnosing BDs. Our code is available at https://github.com/NZWANG/MM-GTUNets.
