Table of Contents
Fetching ...

MedCoAct: Confidence-Aware Multi-Agent Collaboration for Complete Clinical Decision

Hongjie Zheng, Zesheng Shi, Ping Yi

TL;DR

This work tackles the limitation of autonomous medical AI systems that operate on isolated tasks by introducing MedCoAct, a confidence-aware, dual-agent framework that simulates clinical collaboration between doctor and pharmacist agents for end-to-end diagnosis-to-prescription decisions. It introduces DrugCareQA, a 2,700-case benchmark spanning integrated diagnostic and drug-selection workflows, and demonstrates that role specialization, adaptive query planning, and confidence-aware reflection improve diagnostic and medication accuracy by about 7 percentage points over single-agent baselines. The framework relies on a two-stage, role-aware vector retrieval system and an iterative reflection mechanism to mitigate hallucinations and improve decision quality, with evidence from retrieval quality analyses and ablation studies. The results suggest substantial potential for improved telemedicine and routine clinical scenarios, while highlighting areas for further work in expanding specialties and inter-agent communication to enhance scalability and safety.

Abstract

Autonomous agents utilizing Large Language Models (LLMs) have demonstrated remarkable capabilities in isolated medical tasks like diagnosis and image analysis, but struggle with integrated clinical workflows that connect diagnostic reasoning and medication decisions. We identify a core limitation: existing medical AI systems process tasks in isolation without the cross-validation and knowledge integration found in clinical teams, reducing their effectiveness in real-world healthcare scenarios. To transform the isolation paradigm into a collaborative approach, we propose MedCoAct, a confidence-aware multi-agent framework that simulates clinical collaboration by integrating specialized doctor and pharmacist agents, and present a benchmark, DrugCareQA, to evaluate medical AI capabilities in integrated diagnosis and treatment workflows. Our results demonstrate that MedCoAct achieves 67.58\% diagnostic accuracy and 67.58\% medication recommendation accuracy, outperforming single agent framework by 7.04\% and 7.08\% respectively. This collaborative approach generalizes well across diverse medical domains, proving especially effective for telemedicine consultations and routine clinical scenarios, while providing interpretable decision-making pathways.

MedCoAct: Confidence-Aware Multi-Agent Collaboration for Complete Clinical Decision

TL;DR

This work tackles the limitation of autonomous medical AI systems that operate on isolated tasks by introducing MedCoAct, a confidence-aware, dual-agent framework that simulates clinical collaboration between doctor and pharmacist agents for end-to-end diagnosis-to-prescription decisions. It introduces DrugCareQA, a 2,700-case benchmark spanning integrated diagnostic and drug-selection workflows, and demonstrates that role specialization, adaptive query planning, and confidence-aware reflection improve diagnostic and medication accuracy by about 7 percentage points over single-agent baselines. The framework relies on a two-stage, role-aware vector retrieval system and an iterative reflection mechanism to mitigate hallucinations and improve decision quality, with evidence from retrieval quality analyses and ablation studies. The results suggest substantial potential for improved telemedicine and routine clinical scenarios, while highlighting areas for further work in expanding specialties and inter-agent communication to enhance scalability and safety.

Abstract

Autonomous agents utilizing Large Language Models (LLMs) have demonstrated remarkable capabilities in isolated medical tasks like diagnosis and image analysis, but struggle with integrated clinical workflows that connect diagnostic reasoning and medication decisions. We identify a core limitation: existing medical AI systems process tasks in isolation without the cross-validation and knowledge integration found in clinical teams, reducing their effectiveness in real-world healthcare scenarios. To transform the isolation paradigm into a collaborative approach, we propose MedCoAct, a confidence-aware multi-agent framework that simulates clinical collaboration by integrating specialized doctor and pharmacist agents, and present a benchmark, DrugCareQA, to evaluate medical AI capabilities in integrated diagnosis and treatment workflows. Our results demonstrate that MedCoAct achieves 67.58\% diagnostic accuracy and 67.58\% medication recommendation accuracy, outperforming single agent framework by 7.04\% and 7.08\% respectively. This collaborative approach generalizes well across diverse medical domains, proving especially effective for telemedicine consultations and routine clinical scenarios, while providing interpretable decision-making pathways.

Paper Structure

This paper contains 35 sections, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Current medical agents lack reflective mechanisms, causing misdiagnosis, hallucination propagation, and incorrect medication recommendations. It exists also the absence of benchmarks for joint diagnosis-medication evaluation.
  • Figure 2: Overview of the DrugCareQA benchmark construction pipeline. The workflow consists of data collection, quality control, annotation process, and evaluation metric design.
  • Figure 3: The framework demonstrates a complete workflow from patient complaints through doctor agent diagnosis to pharmacist agent medication recommendations. Both agents employ the same five-step architecture of planning, query generation, knowledge retrieval, reflection, and answer generation. The system incorporates confidence mechanisms, multi-path intelligent query retrieval, vector search tools, and reflection mechanisms to enable cross-agent collaboration and improve medical accuracy.
  • Figure 4: Accuracies of top-1 diagnostic accuracy, top-3 diagnostic accuracy, and drug prescription accuracy compared across MedCoAct and the baselines.
  • Figure 5: Accuracies of Qwen3-4B when responding to patient complaints using documents retrieved by MedCoAct and Single Agentic RAG respectively.
  • ...and 1 more figures