Table of Contents
Fetching ...

Learning to reason about rare diseases through retrieval-augmented agents

Ha Young Kim, Jun Li, Ana Beatriz Solana, Carolin M. Pirkl, Benedikt Wiestler, Julia A. Schnabel, Cosmin I. Bercea

TL;DR

The paper addresses the difficulty of diagnosing rare brain diseases due to data scarcity by proposing RADAR, a retrieval-augmented diagnostic reasoning framework that leverages external medical knowledge via Radiopaedia and FAISS in a model-agnostic, multi-agent system. RADAR uses three agents—a initial doctor, a retrieval-augmented reasoning module, and a final doctor—to iteratively generate hypotheses, ground reasoning in retrieved evidence, and output a primary diagnosis with four differentials, all interpretable and evidence-backed. On the NOVA dataset, RADAR yields consistent accuracy improvements across multiple LLM backbones, achieving up to 10.2% gains and notably stronger benefits for open-source models, while also providing literature-grounded explanations that enhance trustworthiness. The work demonstrates a practical path toward trustworthy AI in data-scarce medical imaging by integrating retrieval with reasoning, though it currently relies on radiologist-provided captions and future work should address direct image understanding.

Abstract

Rare diseases represent the long tail of medical imaging, where AI models often fail due to the scarcity of representative training data. In clinical workflows, radiologists frequently consult case reports and literature when confronted with unfamiliar findings. Following this line of reasoning, we introduce RADAR, Retrieval Augmented Diagnostic Reasoning Agents, an agentic system for rare disease detection in brain MRI. Our approach uses AI agents with access to external medical knowledge by embedding both case reports and literature using sentence transformers and indexing them with FAISS to enable efficient similarity search. The agent retrieves clinically relevant evidence to guide diagnostic decision making on unseen diseases, without the need of additional training. Designed as a model-agnostic reasoning module, RADAR can be seamlessly integrated with diverse large language models, consistently improving their rare pathology recognition and interpretability. On the NOVA dataset comprising 280 distinct rare diseases, RADAR achieves up to a 10.2% performance gain, with the strongest improvements observed for open source models such as DeepSeek. Beyond accuracy, the retrieved examples provide interpretable, literature grounded explanations, highlighting retrieval-augmented reasoning as a powerful paradigm for low-prevalence conditions in medical imaging.

Learning to reason about rare diseases through retrieval-augmented agents

TL;DR

The paper addresses the difficulty of diagnosing rare brain diseases due to data scarcity by proposing RADAR, a retrieval-augmented diagnostic reasoning framework that leverages external medical knowledge via Radiopaedia and FAISS in a model-agnostic, multi-agent system. RADAR uses three agents—a initial doctor, a retrieval-augmented reasoning module, and a final doctor—to iteratively generate hypotheses, ground reasoning in retrieved evidence, and output a primary diagnosis with four differentials, all interpretable and evidence-backed. On the NOVA dataset, RADAR yields consistent accuracy improvements across multiple LLM backbones, achieving up to 10.2% gains and notably stronger benefits for open-source models, while also providing literature-grounded explanations that enhance trustworthiness. The work demonstrates a practical path toward trustworthy AI in data-scarce medical imaging by integrating retrieval with reasoning, though it currently relies on radiologist-provided captions and future work should address direct image understanding.

Abstract

Rare diseases represent the long tail of medical imaging, where AI models often fail due to the scarcity of representative training data. In clinical workflows, radiologists frequently consult case reports and literature when confronted with unfamiliar findings. Following this line of reasoning, we introduce RADAR, Retrieval Augmented Diagnostic Reasoning Agents, an agentic system for rare disease detection in brain MRI. Our approach uses AI agents with access to external medical knowledge by embedding both case reports and literature using sentence transformers and indexing them with FAISS to enable efficient similarity search. The agent retrieves clinically relevant evidence to guide diagnostic decision making on unseen diseases, without the need of additional training. Designed as a model-agnostic reasoning module, RADAR can be seamlessly integrated with diverse large language models, consistently improving their rare pathology recognition and interpretability. On the NOVA dataset comprising 280 distinct rare diseases, RADAR achieves up to a 10.2% performance gain, with the strongest improvements observed for open source models such as DeepSeek. Beyond accuracy, the retrieved examples provide interpretable, literature grounded explanations, highlighting retrieval-augmented reasoning as a powerful paradigm for low-prevalence conditions in medical imaging.

Paper Structure

This paper contains 12 sections, 4 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: Overview of the proposed RADAR (Retrieval-Augmented Diagnostic Reasoning Agents) framework. The system employs coordinated agents that retrieve and integrate medical knowledge from external text-based databases to support diagnostic reasoning on rare diseases, achieving up to a 10.2% accuracy gain over non-agentic baselines.
  • Figure 2: Comparison of multi-agent diagnostic reasoning setups. (a) A single-agent system: a single doctor agent generates a diagnosis. (b) Collaborative system: agents exchange independent diagnoses and reach a consensus through discussion rounds. (c) Challenger system: one agent introduces adversarial information to test the robustness of others, and (d) RADAR (ours): retrieval-augmented framework where agents access external medical knowledge via Radiopaedia to refine and ground diagnostic reasoning.
  • Figure 3: Examples of the results generated by our RADAR system. The ground-truth diagnosis is marked with bold.