Learning to reason about rare diseases through retrieval-augmented agents
Ha Young Kim, Jun Li, Ana Beatriz Solana, Carolin M. Pirkl, Benedikt Wiestler, Julia A. Schnabel, Cosmin I. Bercea
TL;DR
The paper addresses the difficulty of diagnosing rare brain diseases due to data scarcity by proposing RADAR, a retrieval-augmented diagnostic reasoning framework that leverages external medical knowledge via Radiopaedia and FAISS in a model-agnostic, multi-agent system. RADAR uses three agents—a initial doctor, a retrieval-augmented reasoning module, and a final doctor—to iteratively generate hypotheses, ground reasoning in retrieved evidence, and output a primary diagnosis with four differentials, all interpretable and evidence-backed. On the NOVA dataset, RADAR yields consistent accuracy improvements across multiple LLM backbones, achieving up to 10.2% gains and notably stronger benefits for open-source models, while also providing literature-grounded explanations that enhance trustworthiness. The work demonstrates a practical path toward trustworthy AI in data-scarce medical imaging by integrating retrieval with reasoning, though it currently relies on radiologist-provided captions and future work should address direct image understanding.
Abstract
Rare diseases represent the long tail of medical imaging, where AI models often fail due to the scarcity of representative training data. In clinical workflows, radiologists frequently consult case reports and literature when confronted with unfamiliar findings. Following this line of reasoning, we introduce RADAR, Retrieval Augmented Diagnostic Reasoning Agents, an agentic system for rare disease detection in brain MRI. Our approach uses AI agents with access to external medical knowledge by embedding both case reports and literature using sentence transformers and indexing them with FAISS to enable efficient similarity search. The agent retrieves clinically relevant evidence to guide diagnostic decision making on unseen diseases, without the need of additional training. Designed as a model-agnostic reasoning module, RADAR can be seamlessly integrated with diverse large language models, consistently improving their rare pathology recognition and interpretability. On the NOVA dataset comprising 280 distinct rare diseases, RADAR achieves up to a 10.2% performance gain, with the strongest improvements observed for open source models such as DeepSeek. Beyond accuracy, the retrieved examples provide interpretable, literature grounded explanations, highlighting retrieval-augmented reasoning as a powerful paradigm for low-prevalence conditions in medical imaging.
