Table of Contents
Fetching ...

MedAgent-Pro: Towards Evidence-based Multi-modal Medical Diagnosis via Reasoning Agentic Workflow

Ziyue Wang, Junde Wu, Linghan Cai, Chang Han Low, Xihong Yang, Qiaxuan Li, Yueming Jin

TL;DR

Medical diagnosis increasingly relies on integrating multi-modal data, but existing VLM-based approaches struggle with stepwise, evidence-based reasoning and quantitative analysis. MedAgent-Pro introduces a hierarchical, guideline-guided workflow with disease-level planning and patient-level reasoning, augmented by retrieval-based guidance and professional visual tools to compute quantitative indicators and verify reliability at every step. Through extensive experiments on glaucoma, heart disease, chest X-ray, and NEJM cases, it outperforms mainstream VLMs, medical agentic systems, and task-specific models, with strong human expert validation. The approach aligns AI-assisted diagnosis with clinical practice, offering more interpretable, dependable decision support and a clear path toward real-world clinical deployment.

Abstract

In modern medicine, clinical diagnosis relies on the comprehensive analysis of primarily textual and visual data, drawing on medical expertise to ensure systematic and rigorous reasoning. Recent advances in large Vision-Language Models (VLMs) and agent-based methods hold great potential for medical diagnosis, thanks to the ability to effectively integrate multi-modal patient data. However, they often provide direct answers and draw empirical-driven conclusions without quantitative analysis, which reduces their reliability and clinical usability. We propose MedAgent-Pro, a new agentic reasoning paradigm that follows the diagnosis principle in modern medicine, to decouple the process into sequential components for step-by-step, evidence-based reasoning. Our MedAgent-Pro workflow presents a hierarchical diagnostic structure to mirror this principle, consisting of disease-level standardized plan generation and patient-level personalized step-by-step reasoning. To support disease-level planning, an RAG-based agent is designed to retrieve medical guidelines to ensure alignment with clinical standards. For patient-level reasoning, we propose to integrate professional tools such as visual models to enable quantitative assessments. Meanwhile, we propose to verify the reliability of each step to achieve evidence-based diagnosis, enforcing rigorous logical reasoning and a well-founded conclusion. Extensive experiments across a wide range of anatomical regions, imaging modalities, and diseases demonstrate the superiority of MedAgent-Pro to mainstream VLMs, agentic systems and state-of-the-art expert models. Ablation studies and human evaluation by clinical experts further validate its robustness and clinical relevance. Code is available at https://github.com/jinlab-imvr/MedAgent-Pro.

MedAgent-Pro: Towards Evidence-based Multi-modal Medical Diagnosis via Reasoning Agentic Workflow

TL;DR

Medical diagnosis increasingly relies on integrating multi-modal data, but existing VLM-based approaches struggle with stepwise, evidence-based reasoning and quantitative analysis. MedAgent-Pro introduces a hierarchical, guideline-guided workflow with disease-level planning and patient-level reasoning, augmented by retrieval-based guidance and professional visual tools to compute quantitative indicators and verify reliability at every step. Through extensive experiments on glaucoma, heart disease, chest X-ray, and NEJM cases, it outperforms mainstream VLMs, medical agentic systems, and task-specific models, with strong human expert validation. The approach aligns AI-assisted diagnosis with clinical practice, offering more interpretable, dependable decision support and a clear path toward real-world clinical deployment.

Abstract

In modern medicine, clinical diagnosis relies on the comprehensive analysis of primarily textual and visual data, drawing on medical expertise to ensure systematic and rigorous reasoning. Recent advances in large Vision-Language Models (VLMs) and agent-based methods hold great potential for medical diagnosis, thanks to the ability to effectively integrate multi-modal patient data. However, they often provide direct answers and draw empirical-driven conclusions without quantitative analysis, which reduces their reliability and clinical usability. We propose MedAgent-Pro, a new agentic reasoning paradigm that follows the diagnosis principle in modern medicine, to decouple the process into sequential components for step-by-step, evidence-based reasoning. Our MedAgent-Pro workflow presents a hierarchical diagnostic structure to mirror this principle, consisting of disease-level standardized plan generation and patient-level personalized step-by-step reasoning. To support disease-level planning, an RAG-based agent is designed to retrieve medical guidelines to ensure alignment with clinical standards. For patient-level reasoning, we propose to integrate professional tools such as visual models to enable quantitative assessments. Meanwhile, we propose to verify the reliability of each step to achieve evidence-based diagnosis, enforcing rigorous logical reasoning and a well-founded conclusion. Extensive experiments across a wide range of anatomical regions, imaging modalities, and diseases demonstrate the superiority of MedAgent-Pro to mainstream VLMs, agentic systems and state-of-the-art expert models. Ablation studies and human evaluation by clinical experts further validate its robustness and clinical relevance. Code is available at https://github.com/jinlab-imvr/MedAgent-Pro.

Paper Structure

This paper contains 31 sections, 5 equations, 8 figures, 5 tables.

Figures (8)

  • Figure 1: Comparison of diagnostic outcomes for two diseases across mainstream VLMs, medical agentic systems, and our proposed MedAgent-Pro workflow.
  • Figure 2: Overview of the MedAgent-Pro framework, which performs diagnosis through a hierarchical structure, with reasoning guided by a VLM supported by an RAG agent and specialized tools.
  • Figure 3: The illustration of the RAG process, which leverages a two-step retrieval.
  • Figure 4: A case study for glaucoma diagnosis, which presents a detailed workflow in the MedAgent-Pro framework. The blue text indicates the agents, while the green text indicates the reasoning steps. In the reasoning, the underlined text indicates the clinical indicators identified through analysis.
  • Figure 5: Ablation on quantitative indicator analysis that reveals how segmentation accuracy influences diagnostic outcomes.
  • ...and 3 more figures