Multimodal Neuroimaging Attention-Based architecture for Cognitive Decline Prediction

Jamie Vo; Naeha Sharif; Ghulam Mubashar Hassan

Multimodal Neuroimaging Attention-Based architecture for Cognitive Decline Prediction

Jamie Vo, Naeha Sharif, Ghulam Mubashar Hassan

TL;DR

The paper tackles early prediction of cognitive decline by forecasting CN-to-MCI/AD progression within $10$ years using a multimodal MRI-PET CNN (MNA-net) that employs patch-based feature extraction and a four-headed self-attention fusion to form cross-modal representations. By evaluating on OASIS-3, the approach achieves $83\%$ accuracy, $80\%$ TNR, and $86\%$ TPR, surpassing baselines with gains of $+5\%$ in accuracy and $+10\%$ in TNR, thus demonstrating the value of attention-driven multimodal fusion. The study confirms that patch-based features and multimodal data yield superior performance, while PET modalities contribute to higher specificity; however, the model is computationally intensive and data-limited, indicating benefits from transfer learning. Overall, the proposed MNA-net highlights the potential of attention-based cross-modal learning for early detection of cognitive impairment and informs future work on patch-level fusion and integrating more data modalities.

Abstract

The early detection of Alzheimer's Disease is imperative to ensure early treatment and improve patient outcomes. There has consequently been extenstive research into detecting AD and its intermediate phase, mild cognitive impairment (MCI). However, there is very small literature in predicting the conversion to AD and MCI from normal cognitive condition. Recently, multiple studies have applied convolutional neural networks (CNN) which integrate Magnetic Resonance Imaging (MRI) and Positron Emission Tomography (PET) to classify MCI and AD. However, in these works, the fusion of MRI and PET features are simply achieved through concatenation, resulting in a lack of cross-modal interactions. In this paper, we propose a novel multimodal neuroimaging attention-based CNN architecture, MNA-net, to predict whether cognitively normal (CN) individuals will develop MCI or AD within a period of 10 years. To address the lack of interactions across neuroimaging modalities seen in previous works, MNA-net utilises attention mechanisms to form shared representations of the MRI and PET images. The proposed MNA-net is tested in OASIS-3 dataset and is able to predict CN individuals who converted to MCI or AD with an accuracy of 83%, true negative rate of 80%, and true positive rate of 86%. The new state of the art results improved by 5% and 10% for accuracy and true negative rate by the use of attention mechanism. These results demonstrate the potential of the proposed model to predict cognitive impairment and attention based mechanisms in the fusion of different neuroimaging modalities to improve the prediction of cognitive decline.

Multimodal Neuroimaging Attention-Based architecture for Cognitive Decline Prediction

TL;DR

The paper tackles early prediction of cognitive decline by forecasting CN-to-MCI/AD progression within

years using a multimodal MRI-PET CNN (MNA-net) that employs patch-based feature extraction and a four-headed self-attention fusion to form cross-modal representations. By evaluating on OASIS-3, the approach achieves

accuracy,

TNR, and

TPR, surpassing baselines with gains of

in accuracy and

in TNR, thus demonstrating the value of attention-driven multimodal fusion. The study confirms that patch-based features and multimodal data yield superior performance, while PET modalities contribute to higher specificity; however, the model is computationally intensive and data-limited, indicating benefits from transfer learning. Overall, the proposed MNA-net highlights the potential of attention-based cross-modal learning for early detection of cognitive impairment and informs future work on patch-level fusion and integrating more data modalities.

Abstract

Paper Structure (17 sections, 5 equations, 9 figures, 2 tables)

This paper contains 17 sections, 5 equations, 9 figures, 2 tables.

Introduction
Related Works
Materials and Methods
Data Collection
Subject Selection
Image Processing
Proposed Architecture: MNA-net
Patch-based Feature Extraction
Attention-based Multimodal Feature Fusion
Experimental Settings and Evaluation Metrics
Results and Discussion
Evaluation of Attention Based Mechanisms in Neuroimage Feature Fusion
Ablation study
Evaluation of Patch-based Feature Extraction in classification performance
Evaluation of Multimodal Neuroimages in classification performance
...and 2 more sections

Figures (9)

Figure 1: Example of noise and skull removal from a PIB PET image in grayscale.
Figure 2: Example of data augmentation applied to an MRI image. From left to right: Normal Brain, Affine Transformed Brain, Elastically Deformed Brain
Figure 3: MNA-net architecture
Figure 4: Patch-based Feature Extraction stage: A 3D ResNet-10 architecture where the features from the dense layer prior to the final dense and sigmoid layer (coloured in blue) are extracted and used as inputs for the multimodal attention classification stage.
Figure 5: Attention model architecture: The features from the dense layer prior to the final dense and sigmoid layer (coloured in blue) are extracted and used as inputs for the patch fusion classification stage.
...and 4 more figures

Multimodal Neuroimaging Attention-Based architecture for Cognitive Decline Prediction

TL;DR

Abstract

Multimodal Neuroimaging Attention-Based architecture for Cognitive Decline Prediction

Authors

TL;DR

Abstract

Table of Contents

Figures (9)