Table of Contents
Fetching ...

REMEMBER: Retrieval-based Explainable Multimodal Evidence-guided Modeling for Brain Evaluation and Reasoning in Zero- and Few-shot Neurodegenerative Diagnosis

Duy-Cat Can, Quang-Huy Tang, Huong Ha, Binh T. Nguyen, Oliver Y. Chén

TL;DR

REMEMBER tackles the challenge of diagnosing neurodegenerative disease from MRI under limited labeled data by combining a vision-language, contrastive encoder with a retrieval mechanism that grounds predictions in expert reference cases. It introduces pseudo-text modalities to enrich cross-modal alignment and an evidence-encoding + attention-based inference head to integrate case-level context. The framework outputs diagnostic predictions plus clinically grounded explanations that reference retrieved cases, aligning AI reasoning with clinical workflows. Experiments on a curated MINDSet and a public AD dataset show robust zero-/few-shot performance and improved interpretability, highlighting REMEMBER's potential for real-world, data-scarce neuroimaging diagnosis.

Abstract

Timely and accurate diagnosis of neurodegenerative disorders, such as Alzheimer's disease, is central to disease management. Existing deep learning models require large-scale annotated datasets and often function as "black boxes". Additionally, datasets in clinical practice are frequently small or unlabeled, restricting the full potential of deep learning methods. Here, we introduce REMEMBER -- Retrieval-based Explainable Multimodal Evidence-guided Modeling for Brain Evaluation and Reasoning -- a new machine learning framework that facilitates zero- and few-shot Alzheimer's diagnosis using brain MRI scans through a reference-based reasoning process. Specifically, REMEMBER first trains a contrastively aligned vision-text model using expert-annotated reference data and extends pseudo-text modalities that encode abnormality types, diagnosis labels, and composite clinical descriptions. Then, at inference time, REMEMBER retrieves similar, human-validated cases from a curated dataset and integrates their contextual information through a dedicated evidence encoding module and attention-based inference head. Such an evidence-guided design enables REMEMBER to imitate real-world clinical decision-making process by grounding predictions in retrieved imaging and textual context. Specifically, REMEMBER outputs diagnostic predictions alongside an interpretable report, including reference images and explanations aligned with clinical workflows. Experimental results demonstrate that REMEMBER achieves robust zero- and few-shot performance and offers a powerful and explainable framework to neuroimaging-based diagnosis in the real world, especially under limited data.

REMEMBER: Retrieval-based Explainable Multimodal Evidence-guided Modeling for Brain Evaluation and Reasoning in Zero- and Few-shot Neurodegenerative Diagnosis

TL;DR

REMEMBER tackles the challenge of diagnosing neurodegenerative disease from MRI under limited labeled data by combining a vision-language, contrastive encoder with a retrieval mechanism that grounds predictions in expert reference cases. It introduces pseudo-text modalities to enrich cross-modal alignment and an evidence-encoding + attention-based inference head to integrate case-level context. The framework outputs diagnostic predictions plus clinically grounded explanations that reference retrieved cases, aligning AI reasoning with clinical workflows. Experiments on a curated MINDSet and a public AD dataset show robust zero-/few-shot performance and improved interpretability, highlighting REMEMBER's potential for real-world, data-scarce neuroimaging diagnosis.

Abstract

Timely and accurate diagnosis of neurodegenerative disorders, such as Alzheimer's disease, is central to disease management. Existing deep learning models require large-scale annotated datasets and often function as "black boxes". Additionally, datasets in clinical practice are frequently small or unlabeled, restricting the full potential of deep learning methods. Here, we introduce REMEMBER -- Retrieval-based Explainable Multimodal Evidence-guided Modeling for Brain Evaluation and Reasoning -- a new machine learning framework that facilitates zero- and few-shot Alzheimer's diagnosis using brain MRI scans through a reference-based reasoning process. Specifically, REMEMBER first trains a contrastively aligned vision-text model using expert-annotated reference data and extends pseudo-text modalities that encode abnormality types, diagnosis labels, and composite clinical descriptions. Then, at inference time, REMEMBER retrieves similar, human-validated cases from a curated dataset and integrates their contextual information through a dedicated evidence encoding module and attention-based inference head. Such an evidence-guided design enables REMEMBER to imitate real-world clinical decision-making process by grounding predictions in retrieved imaging and textual context. Specifically, REMEMBER outputs diagnostic predictions alongside an interpretable report, including reference images and explanations aligned with clinical workflows. Experimental results demonstrate that REMEMBER achieves robust zero- and few-shot performance and offers a powerful and explainable framework to neuroimaging-based diagnosis in the real world, especially under limited data.

Paper Structure

This paper contains 43 sections, 19 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: An overview of the REMEMBER framework. When receiving a brain MRI scan from a new subject (a query brain scan), REMEMBER embeds the neuroimaging data and compares them to textual anchors and reference cases (confirmed cases most similar to the query brain scan). Subsequently, REMEMBER encodes the retrieved evidence - both imaging and textual - and uses it to make the final diagnosis via an attention-based inference module.
  • Figure 2: Performance of REMEMBER under few-shot supervision for two representative tasks. Evaluation of REMEMBER with varying numbers of labeled training samples per class ($k \in \{5, 10, 20, 50, 100\}$). Left: Abnormality prediction on the curated MINDSet dataset. Right: Dementia severity prediction on the public dataset. The mean and standard deviation of Precision, Recall, and F1 are reported across 10 runs. REMEMBER achieves robust performance and stable generalization even with limited supervision.
  • Figure 3: Retrieval consistency and similarity distribution. Left: Label consistency of retrieved references across top-$k$ results. Precision, recall, and F1 scores measure how often the brain scans from the retrieved cases share the same abnormality type (AT) or dementia type (DT) as the query brain scan. Right: Distribution of similarity scores. Violin plots compare the similarity scores of retrieved cases with correct vs. incorrect labels in abnormality and dementia tasks.
  • Figure 4: A visualization of REMEMBER's embedding space under different fine-tuning scenarios using t-SNE. Each column shows the model trained on (a) the MINDSet dataset, (b) the public dataset, and (c) both datasets. The top row visualizes embeddings of MINDSet samples, colored by abnormality type and shaped by dementia type. The bottom row shows embeddings of the samples from the public dataset, colored by dementia severity. Models trained on a single dataset capture dataset-specific structure: (a) clearly separates abnormality and dementia types; (b) forms distinct clusters across dementia severity. Joint training (c) preserves strong separation for abnormality and dementia types while partially capturing severity as a smooth progression from non-demented to moderately demented stages.
  • Figure 5: Ablation study on abnormality prediction. When removing components from REMEMBER's evidence-guided inference pipeline (where the components include the evidence image, textual modalities, similarity scores, and attention mechanism), model performance drops in macro-averaged F1, Precision, and Recall. Error bars show standard deviation across 10 runs.