Unified Cross-Modal Attention-Mixer Based Structural-Functional Connectomics Fusion for Neuropsychiatric Disorder Diagnosis
Badhan Mazumder, Lei Wu, Vince D. Calhoun, Dong Hye Ye
TL;DR
The paper addresses SZ diagnosis by fusing structural and functional brain connectomes using ConneX, a unified cross-modal attention–mixer framework. It combines explainability-enhanced modality-specific GNN backbones with a joint, multi-view fusion pipeline that includes a unified representation $R^{u}$ and MLP-Mixer refinements, trained under a multi-head loss to balance modalities. Key contributions include disorder-specific explainability masks for GNNs, a cross-modal attention–mixer fusion architecture, and comprehensive validation on FBIRN and COBRE showing improved accuracy over SOTA baselines. This approach advances multimodal connectomics by effectively modeling intra- and inter-modal interactions at global and local levels, with potential impact on diagnostic accuracy and interpretability in neuropsychiatric disorders.
Abstract
Gaining insights into the structural and functional mechanisms of the brain has been a longstanding focus in neuroscience research, particularly in the context of understanding and treating neuropsychiatric disorders such as Schizophrenia (SZ). Nevertheless, most of the traditional multimodal deep learning approaches fail to fully leverage the complementary characteristics of structural and functional connectomics data to enhance diagnostic performance. To address this issue, we proposed ConneX, a multimodal fusion method that integrates cross-attention mechanism and multilayer perceptron (MLP)-Mixer for refined feature fusion. Modality-specific backbone graph neural networks (GNNs) were firstly employed to obtain feature representation for each modality. A unified cross-modal attention network was then introduced to fuse these embeddings by capturing intra- and inter-modal interactions, while MLP-Mixer layers refined global and local features, leveraging higher-order dependencies for end-to-end classification with a multi-head joint loss. Extensive evaluations demonstrated improved performance on two distinct clinical datasets, highlighting the robustness of our proposed framework.
