Cross-Attention Fusion of MRI and Jacobian Maps for Alzheimer's Disease Diagnosis
Shijia Zhang, Xiyu Ding, Brian Caffo, Junyu Chen, Cindy Zhang, Hadi Kharrazi, Zheyu Wang
TL;DR
This work addresses early Alzheimer's disease diagnosis by combining sMRI intensity information with Jacobian determinant maps (JSM) of brain deformations using cross-attention fusion. A lightweight dual-3D CNN encoder processes both modalities, and cross-attention uses deformation features as queries to selectively attend to sMRI patterns, enabling intrinsic deformation-structure modeling. On ADNI data, the method achieves mean ROC-AUCs of $0.903 \pm 0.033$ for CN vs AD and $0.692 \pm 0.061$ for CN vs MCI, with only $1.56 \times 10^{6}$ parameters—more than 40x smaller than large pre-trained encoders. The results demonstrate that cross-attention fusion can deliver competitive diagnostic performance with high efficiency and has potential for broader application to multimodal neuroimaging and other neurodegenerative disease classification tasks.
Abstract
Early diagnosis of Alzheimer's disease (AD) is critical for intervention before irreversible neurodegeneration occurs. Structural MRI (sMRI) is widely used for AD diagnosis, but conventional deep learning approaches primarily rely on intensity-based features, which require large datasets to capture subtle structural changes. Jacobian determinant maps (JSM) provide complementary information by encoding localized brain deformations, yet existing multimodal fusion strategies fail to fully integrate these features with sMRI. We propose a cross-attention fusion framework to model the intrinsic relationship between sMRI intensity and JSM-derived deformations for AD classification. Using the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset, we compare cross-attention, pairwise self-attention, and bottleneck attention with four pre-trained 3D image encoders. Cross-attention fusion achieves superior performance, with mean ROC-AUC scores of 0.903 (+/-0.033) for AD vs. cognitively normal (CN) and 0.692 (+/-0.061) for mild cognitive impairment (MCI) vs. CN. Despite its strong performance, our model remains highly efficient, with only 1.56 million parameters--over 40 times fewer than ResNet-34 (63M) and Swin UNETR (61.98M). These findings demonstrate the potential of cross-attention fusion for improving AD diagnosis while maintaining computational efficiency.
