GFE-Mamba: Mamba-based AD Multi-modal Progression Assessment via Generative Feature Extraction from MCI

Zhaojie Fang; Shenghao Zhu; Yifei Chen; Binfeng Zou; Fan Jia; Chang Liu; Xiang Feng; Linwei Qiu; Feiwei Qin; Jin Fan; Changbiao Chu; Changmiao Wang

GFE-Mamba: Mamba-based AD Multi-modal Progression Assessment via Generative Feature Extraction from MCI

Zhaojie Fang, Shenghao Zhu, Yifei Chen, Binfeng Zou, Fan Jia, Chang Liu, Xiang Feng, Linwei Qiu, Feiwei Qin, Jin Fan, Changbiao Chu, Changmiao Wang

TL;DR

The paper tackles the challenge of predicting progression from Mild Cognitive Impairment to Alzheimer's Disease by leveraging multimodal data and addressing the scarcity of perfectly paired MRI and PET data. It introduces GFE-Mamba, a generation-assisted multimodal framework that uses a 3D GAN-ViT to synthesize PET features from MRI, a six-block Mamba classifier to process long sequences including scale information, and Pixel-Level Bi-Cross Attention to fuse modalities at a fine-grained level. The approach achieves state-of-the-art results on ADNI-derived one-year and three-year progression datasets, with comprehensive ablations confirming the contribution of Generative Feature Extraction, intermediate GAN features, attention mechanisms, and modality data. The work advances practical early prediction of AD progression and points toward scalable, integrative diagnostic tools that combine MRI and structured assessment data in clinical workflows.

Abstract

Alzheimer's Disease (AD) is a progressive, irreversible neurodegenerative disorder that often originates from Mild Cognitive Impairment (MCI). This progression results in significant memory loss and severely affects patients' quality of life. Clinical trials have consistently shown that early and targeted interventions for individuals with MCI may slow or even prevent the advancement of AD. Research indicates that accurate medical classification requires diverse multimodal data, including detailed assessment scales and neuroimaging techniques like Magnetic Resonance Imaging (MRI) and Positron Emission Tomography (PET). However, simultaneously collecting the aforementioned three modalities for training presents substantial challenges. To tackle these difficulties, we propose GFE-Mamba, a multimodal classifier founded on Generative Feature Extractor. The intermediate features provided by this Extractor can compensate for the shortcomings of PET and achieve profound multimodal fusion in the classifier. The Mamba block, as the backbone of the classifier, enables it to efficiently extract information from long-sequence scale information. Pixel-level Bi-cross Attention supplements pixel-level information from MRI and PET. We provide our rationale for developing this cross-temporal progression prediction dataset and the pre-trained Extractor weights. Our experimental findings reveal that the GFE-Mamba model effectively predicts the progression from MCI to AD and surpasses several leading methods in the field. Our source code is available at https://github.com/Tinysqua/GFE-Mamba.

GFE-Mamba: Mamba-based AD Multi-modal Progression Assessment via Generative Feature Extraction from MCI

TL;DR

Abstract

Paper Structure (16 sections, 10 equations, 9 figures, 3 tables)

This paper contains 16 sections, 10 equations, 9 figures, 3 tables.

Introduction
Related Work
Traditional and Machine Learning Prediction Methods
Neural Network Based Prediction Method
Comparison with Existing Work
Methods
3D GAN-ViT on MRI-PET Task
Multimodal Mamba Classifier
Pixel Level Bi-Cross Attention
Experiment
Data Acquisition and Processing
Evaluation Indicators
Experimental Settings
Comparative Experiments
Ablation Study
...and 1 more sections

Figures (9)

Figure 1: The overall architecture of GFE-Mamba. It contains a 3D GAN-ViT that was pre-trained on the MRI to PET generation task. 2D Latent MRI and PET features extracted from the 3D GAN-ViT are fused with the scale information and then both are fed into the Multimodal Mamba classifier. The output of the classifier and Pixel-level MRI/PET will predict a binary classification outcome after Pixel-Level Bi-Cross Attention.
Figure 2: The architecture of 3D GAN-ViT. The MRI data with a shape of $D \times H \times W$ is compressed through an encoder to obtain a 2D latent MRI representation. After dividing it into patches, it is fed into a Vision Transformer (ViT). The output is then reshaped into a latent PET representation, which is subsequently passed through a decoder to reconstruct the PET, where the generated PET will be sent together with the real PET into the Discriminator to assess the quality of generation and training.
Figure 3: The framework and component modules of the Mamba Classifier and Pixel-Level Bi-Cross Attention. Part A is details of the Mamba Classifier and the Mamba module in it. Part B is details of the Pixel-Level Bi-Cross Attention module.
Figure 4: The results of 3D GAN-ViT. From left to right, these are slices of 3D images in the axial direction. To eliminate the black background around the images, the first column on the left starts with the 16th slice, with each subsequent column adding 8 slices. Within each column, from top to bottom, are the MRI, Generated PET, and PET images.
Figure 5: The process of constructing MCI-AD dataset from ADNI dataset. When constructing data, the MRI diagnosis is identified at time x, and then at time x+1, the AD/MCI diagnoses are labeled as positive/negative samples respectively.
...and 4 more figures

GFE-Mamba: Mamba-based AD Multi-modal Progression Assessment via Generative Feature Extraction from MCI

TL;DR

Abstract

GFE-Mamba: Mamba-based AD Multi-modal Progression Assessment via Generative Feature Extraction from MCI

Authors

TL;DR

Abstract

Table of Contents

Figures (9)