Table of Contents
Fetching ...

MedRAX: Medical Reasoning Agent for Chest X-ray

Adibvafa Fallahpour, Jun Ma, Alif Munim, Hongwei Lyu, Bo Wang

TL;DR

MedRAX introduces a specialized, tool-augmented AI agent for chest X-ray interpretation that uses a ReAct loop to orchestrate diverse clinical tools without retraining. It is evaluated on ChestAgentBench, a 2,500-question, multi-competency benchmark derived from Eurorad cases, and several additional radiology benchmarks, where MedRAX demonstrates state-of-the-art performance in complex, multi-step reasoning tasks. The framework emphasizes modularity, transparency, and practical deployment potential, with a Gradio interface and privacy-conscious deployment options. Overall, the work argues that combining large-scale reasoning with domain-specific tools yields robust, interpretable performance and outlines future work on uncertainty, tool optimization, and broader multimodal capabilities.

Abstract

Chest X-rays (CXRs) play an integral role in driving critical decisions in disease management and patient care. While recent innovations have led to specialized models for various CXR interpretation tasks, these solutions often operate in isolation, limiting their practical utility in clinical practice. We present MedRAX, the first versatile AI agent that seamlessly integrates state-of-the-art CXR analysis tools and multimodal large language models into a unified framework. MedRAX dynamically leverages these models to address complex medical queries without requiring additional training. To rigorously evaluate its capabilities, we introduce ChestAgentBench, a comprehensive benchmark containing 2,500 complex medical queries across 7 diverse categories. Our experiments demonstrate that MedRAX achieves state-of-the-art performance compared to both open-source and proprietary models, representing a significant step toward the practical deployment of automated CXR interpretation systems. Data and code have been publicly available at https://github.com/bowang-lab/MedRAX

MedRAX: Medical Reasoning Agent for Chest X-ray

TL;DR

MedRAX introduces a specialized, tool-augmented AI agent for chest X-ray interpretation that uses a ReAct loop to orchestrate diverse clinical tools without retraining. It is evaluated on ChestAgentBench, a 2,500-question, multi-competency benchmark derived from Eurorad cases, and several additional radiology benchmarks, where MedRAX demonstrates state-of-the-art performance in complex, multi-step reasoning tasks. The framework emphasizes modularity, transparency, and practical deployment potential, with a Gradio interface and privacy-conscious deployment options. Overall, the work argues that combining large-scale reasoning with domain-specific tools yields robust, interpretable performance and outlines future work on uncertainty, tool optimization, and broader multimodal capabilities.

Abstract

Chest X-rays (CXRs) play an integral role in driving critical decisions in disease management and patient care. While recent innovations have led to specialized models for various CXR interpretation tasks, these solutions often operate in isolation, limiting their practical utility in clinical practice. We present MedRAX, the first versatile AI agent that seamlessly integrates state-of-the-art CXR analysis tools and multimodal large language models into a unified framework. MedRAX dynamically leverages these models to address complex medical queries without requiring additional training. To rigorously evaluate its capabilities, we introduce ChestAgentBench, a comprehensive benchmark containing 2,500 complex medical queries across 7 diverse categories. Our experiments demonstrate that MedRAX achieves state-of-the-art performance compared to both open-source and proprietary models, representing a significant step toward the practical deployment of automated CXR interpretation systems. Data and code have been publicly available at https://github.com/bowang-lab/MedRAX

Paper Structure

This paper contains 38 sections, 4 figures, 5 tables, 1 algorithm.

Figures (4)

  • Figure 1: Architecture of MedRAX. The framework implements a ReAct loop that processes user queries by integrating short-term memory (LangChain) and specialized medical tools for visual QA (CheXagent chexagent, LLaVA-Med li2024llava), segmentation (MedSAM medsamsamma2025medsam2segment3dmedical, ChestX-Det ChestX-Detpspnet), grounding (Maira-2 maira2), report generation (model trained on CheXpert Plus chexpertchexpert-plus), classification (TorchXRayVision torchxrayvision1torchxrayvision2), and image generation (RoentGen roentgen).
  • Figure 2: MedRAX Interaction Flow. An example of how MedRAX handles a multi-turn conversation through its ReAct loop (< thought>, < action>, < observation>) along with tool outputs and final response. For clarity, the production interface shows only tool outputs and agent responses.
  • Figure 3:
  • Figure 4: MedRAX and GPT-4o Case Study. (Case 17576) Correct answer is chest tube. GPT-4o incorrectly identifies as endotracheal tube based on position, while MedRAX correctly identifies chest tube by integrating multiple tool outputs, even resolving conflicting tool suggestions. (Case 16703) Correct answer is left pneumothorax. GPT-4o misdiagnoses as right-sided pneumonia/edema, while MedRAX correctly identifies left pneumothorax through sequential tool application for disease detection and comparative lung analysis.