Table of Contents
Fetching ...

Unified Cross-Modal Attention-Mixer Based Structural-Functional Connectomics Fusion for Neuropsychiatric Disorder Diagnosis

Badhan Mazumder, Lei Wu, Vince D. Calhoun, Dong Hye Ye

TL;DR

The paper addresses SZ diagnosis by fusing structural and functional brain connectomes using ConneX, a unified cross-modal attention–mixer framework. It combines explainability-enhanced modality-specific GNN backbones with a joint, multi-view fusion pipeline that includes a unified representation $R^{u}$ and MLP-Mixer refinements, trained under a multi-head loss to balance modalities. Key contributions include disorder-specific explainability masks for GNNs, a cross-modal attention–mixer fusion architecture, and comprehensive validation on FBIRN and COBRE showing improved accuracy over SOTA baselines. This approach advances multimodal connectomics by effectively modeling intra- and inter-modal interactions at global and local levels, with potential impact on diagnostic accuracy and interpretability in neuropsychiatric disorders.

Abstract

Gaining insights into the structural and functional mechanisms of the brain has been a longstanding focus in neuroscience research, particularly in the context of understanding and treating neuropsychiatric disorders such as Schizophrenia (SZ). Nevertheless, most of the traditional multimodal deep learning approaches fail to fully leverage the complementary characteristics of structural and functional connectomics data to enhance diagnostic performance. To address this issue, we proposed ConneX, a multimodal fusion method that integrates cross-attention mechanism and multilayer perceptron (MLP)-Mixer for refined feature fusion. Modality-specific backbone graph neural networks (GNNs) were firstly employed to obtain feature representation for each modality. A unified cross-modal attention network was then introduced to fuse these embeddings by capturing intra- and inter-modal interactions, while MLP-Mixer layers refined global and local features, leveraging higher-order dependencies for end-to-end classification with a multi-head joint loss. Extensive evaluations demonstrated improved performance on two distinct clinical datasets, highlighting the robustness of our proposed framework.

Unified Cross-Modal Attention-Mixer Based Structural-Functional Connectomics Fusion for Neuropsychiatric Disorder Diagnosis

TL;DR

The paper addresses SZ diagnosis by fusing structural and functional brain connectomes using ConneX, a unified cross-modal attention–mixer framework. It combines explainability-enhanced modality-specific GNN backbones with a joint, multi-view fusion pipeline that includes a unified representation and MLP-Mixer refinements, trained under a multi-head loss to balance modalities. Key contributions include disorder-specific explainability masks for GNNs, a cross-modal attention–mixer fusion architecture, and comprehensive validation on FBIRN and COBRE showing improved accuracy over SOTA baselines. This approach advances multimodal connectomics by effectively modeling intra- and inter-modal interactions at global and local levels, with potential impact on diagnostic accuracy and interpretability in neuropsychiatric disorders.

Abstract

Gaining insights into the structural and functional mechanisms of the brain has been a longstanding focus in neuroscience research, particularly in the context of understanding and treating neuropsychiatric disorders such as Schizophrenia (SZ). Nevertheless, most of the traditional multimodal deep learning approaches fail to fully leverage the complementary characteristics of structural and functional connectomics data to enhance diagnostic performance. To address this issue, we proposed ConneX, a multimodal fusion method that integrates cross-attention mechanism and multilayer perceptron (MLP)-Mixer for refined feature fusion. Modality-specific backbone graph neural networks (GNNs) were firstly employed to obtain feature representation for each modality. A unified cross-modal attention network was then introduced to fuse these embeddings by capturing intra- and inter-modal interactions, while MLP-Mixer layers refined global and local features, leveraging higher-order dependencies for end-to-end classification with a multi-head joint loss. Extensive evaluations demonstrated improved performance on two distinct clinical datasets, highlighting the robustness of our proposed framework.

Paper Structure

This paper contains 17 sections, 5 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Method overview: GNNs were trained on modality-wise connectome graphs, followed by an explanation generator producing a shared mask across subjects, which was utilized for fine-tuning the base GNNs to enhance learned representations. These structural-functional representations were then cross-attended using our proposed fusion method ConneX where these representations were combined and fed as an additional unified representation (highlighted with green arrows) followed by a multi-head joint loss for final classification task.
  • Figure 2: Axial visualization of the top 100 most significant brain network connections, spanning the subcortical (SCN), auditory (ADN), sensorimotor (SMN), visual (VSN), cognitive control (CON), default mode (DMN), and cerebellar (CBN) networks, for both structural and functional connectomes across SZ and HC groups in the FBIRN and COBRE datasets. Connections within the same brain network are highlighted with distinct colors, while inter-network connections are represented in gray. The edge width reflects the weight in the explanatory graph.