Table of Contents
Fetching ...

An Explainable Diagnostic Framework for Neurodegenerative Dementias via Reinforcement-Optimized LLM Reasoning

Andrew Zamai, Nathanael Fijalkow, Boris Mansencal, Laurent Simon, Eloi Navet, Pierrick Coupe

TL;DR

This paper tackles the difficulty of differentiating neurodegenerative dementias using MRI by converting high-resolution brain scans into textual radiology reports and leveraging reasoning-capable LLMs to perform differential diagnosis. A modular pipeline (MRI segmentation, volume ratios, normative atrophy modeling, and radiology report generation) provides semantically meaningful inputs to LLMs, while reinforcement learning with Group Relative Policy Optimization (GRPO) trains lightweight models to produce structured, anatomically grounded diagnostic rationales during inference. The study demonstrates that GRPO-finetuned 8B models can match or surpass larger models in diagnostic accuracy and generate coherent, hypothesis-driven rationales, outperforming traditional classification-only approaches. This work highlights the potential of inference-time reasoning with interpretable explanations to enhance clinical trust and utility in AI-driven neuroimaging diagnostics, offering a practical framework for transparent, data-efficient medical AI. The core mathematical construct, the Structural Deviation Score $SDS$, anchors the qualitative report generation to normative neuroanatomical trajectories, enabling standardized severity mapping across brain regions.$

Abstract

The differential diagnosis of neurodegenerative dementias is a challenging clinical task, mainly because of the overlap in symptom presentation and the similarity of patterns observed in structural neuroimaging. To improve diagnostic efficiency and accuracy, deep learning-based methods such as Convolutional Neural Networks and Vision Transformers have been proposed for the automatic classification of brain MRIs. However, despite their strong predictive performance, these models find limited clinical utility due to their opaque decision making. In this work, we propose a framework that integrates two core components to enhance diagnostic transparency. First, we introduce a modular pipeline for converting 3D T1-weighted brain MRIs into textual radiology reports. Second, we explore the potential of modern Large Language Models (LLMs) to assist clinicians in the differential diagnosis between Frontotemporal dementia subtypes, Alzheimer's disease, and normal aging based on the generated reports. To bridge the gap between predictive accuracy and explainability, we employ reinforcement learning to incentivize diagnostic reasoning in LLMs. Without requiring supervised reasoning traces or distillation from larger models, our approach enables the emergence of structured diagnostic rationales grounded in neuroimaging findings. Unlike post-hoc explainability methods that retrospectively justify model decisions, our framework generates diagnostic rationales as part of the inference process-producing causally grounded explanations that inform and guide the model's decision-making process. In doing so, our framework matches the diagnostic performance of existing deep learning methods while offering rationales that support its diagnostic conclusions.

An Explainable Diagnostic Framework for Neurodegenerative Dementias via Reinforcement-Optimized LLM Reasoning

TL;DR

This paper tackles the difficulty of differentiating neurodegenerative dementias using MRI by converting high-resolution brain scans into textual radiology reports and leveraging reasoning-capable LLMs to perform differential diagnosis. A modular pipeline (MRI segmentation, volume ratios, normative atrophy modeling, and radiology report generation) provides semantically meaningful inputs to LLMs, while reinforcement learning with Group Relative Policy Optimization (GRPO) trains lightweight models to produce structured, anatomically grounded diagnostic rationales during inference. The study demonstrates that GRPO-finetuned 8B models can match or surpass larger models in diagnostic accuracy and generate coherent, hypothesis-driven rationales, outperforming traditional classification-only approaches. This work highlights the potential of inference-time reasoning with interpretable explanations to enhance clinical trust and utility in AI-driven neuroimaging diagnostics, offering a practical framework for transparent, data-efficient medical AI. The core mathematical construct, the Structural Deviation Score , anchors the qualitative report generation to normative neuroanatomical trajectories, enabling standardized severity mapping across brain regions.$

Abstract

The differential diagnosis of neurodegenerative dementias is a challenging clinical task, mainly because of the overlap in symptom presentation and the similarity of patterns observed in structural neuroimaging. To improve diagnostic efficiency and accuracy, deep learning-based methods such as Convolutional Neural Networks and Vision Transformers have been proposed for the automatic classification of brain MRIs. However, despite their strong predictive performance, these models find limited clinical utility due to their opaque decision making. In this work, we propose a framework that integrates two core components to enhance diagnostic transparency. First, we introduce a modular pipeline for converting 3D T1-weighted brain MRIs into textual radiology reports. Second, we explore the potential of modern Large Language Models (LLMs) to assist clinicians in the differential diagnosis between Frontotemporal dementia subtypes, Alzheimer's disease, and normal aging based on the generated reports. To bridge the gap between predictive accuracy and explainability, we employ reinforcement learning to incentivize diagnostic reasoning in LLMs. Without requiring supervised reasoning traces or distillation from larger models, our approach enables the emergence of structured diagnostic rationales grounded in neuroimaging findings. Unlike post-hoc explainability methods that retrospectively justify model decisions, our framework generates diagnostic rationales as part of the inference process-producing causally grounded explanations that inform and guide the model's decision-making process. In doing so, our framework matches the diagnostic performance of existing deep learning methods while offering rationales that support its diagnostic conclusions.

Paper Structure

This paper contains 31 sections, 1 equation, 12 figures, 3 tables.

Figures (12)

  • Figure 1: Overview of the proposed framework for the automated differential diagnosis of neurodegenerative dementias. 3D T1-weighted brain MRIs are converted into radiology reports and used to prompt an LLM for detailed diagnostic reasoning and a final ranked list of candidate diagnoses.
  • Figure 2: Atrophy estimation via normative modeling. Left: Lifespan curve of left hippocampal volume ratio with normative mean $\mu_{\text{norm}}(a, s)$ (black) and confidence bounds $\pm \sigma_{\text{norm}}(a, s)$ (blue/red). Right: SDS distributions across diagnostic groups, reflecting condition-specific structural deviations.
  • Figure 3: Mapping structural deviation to qualitative severity. Left: Quantitative-to-qualitative conversion of Structural Deviation Scores (SDS) using a seven-point severity scale ranging from severe atrophy to severe enlargement. Right: Example of a (truncated) generated radiology report summarizing anatomical findings by region and hemisphere, using standardized severity descriptors.
  • Figure 4: Prompt used to elicit open-ended diagnostic reasoning from MRI reports, ending in a ranked list of differential diagnoses.
  • Figure 5: Excerpts from the DeepSeek-R1-Distill-Llama-8B-GRPO model. The responses exhibit properties such as evidence-based hypothesis testing, non-linear reasoning, and detailed understanding of expected anatomical regions and atrophy severity.
  • ...and 7 more figures