An Interpretable Cross-Attentive Multi-modal MRI Fusion Framework for Schizophrenia Diagnosis
Ziyu Zhou, Anton Orlichenko, Gang Qu, Zening Fu, Vince D Calhoun, Zhengming Ding, Yu-Ping Wang
TL;DR
Schizophrenia diagnosis benefits from multi-modal MRI, yet fMRI and sMRI heterogeneity hinders simple fusion. The authors propose CAMF, a Cross-Attentive Multi-modal Fusion framework that uses self-attention to model intra-modal interactions and cross-attention to capture inter-modal interactions, fused adaptively into a final representation $f_O$ for classification. Training relies on standard cross-entropy loss with the Adam optimizer and He initialization, and Score-CAM provides interpretable saliency maps identifying disease-relevant networks and regions. Across combined COBRE/FBIRN/MPRC data and the BSNIP dataset, CAMF outperforms baselines and yields biomarker-consistent interpretations, highlighting its potential for diagnostic accuracy and mechanistic insight into schizophrenia.
Abstract
Both functional and structural magnetic resonance imaging (fMRI and sMRI) are widely used for the diagnosis of mental disorder. However, combining complementary information from these two modalities is challenging due to their heterogeneity. Many existing methods fall short of capturing the interaction between these modalities, frequently defaulting to a simple combination of latent features. In this paper, we propose a novel Cross-Attentive Multi-modal Fusion framework (CAMF), which aims to capture both intra-modal and inter-modal relationships between fMRI and sMRI, enhancing multi-modal data representation. Specifically, our CAMF framework employs self-attention modules to identify interactions within each modality while cross-attention modules identify interactions between modalities. Subsequently, our approach optimizes the integration of latent features from both modalities. This approach significantly improves classification accuracy, as demonstrated by our evaluations on two extensive multi-modal brain imaging datasets, where CAMF consistently outperforms existing methods. Furthermore, the gradient-guided Score-CAM is applied to interpret critical functional networks and brain regions involved in schizophrenia. The bio-markers identified by CAMF align with established research, potentially offering new insights into the diagnosis and pathological endophenotypes of schizophrenia.
