MAP: Evaluation and Multi-Agent Enhancement of Large Language Models for Inpatient Pathways
Zhen Chen, Zhihao Peng, Xusheng Liang, Cheng Wang, Peigan Liang, Linsheng Zeng, Minjie Ju, Yixuan Yuan
TL;DR
This work tackles the absence of inpatient-specific AI benchmarks and large-scale datasets by introducing IPDS, a MIMIC-IV-derived benchmark covering 9 departments, 17 diseases, and 16 treatment pathways. They propose MAP, a multi-agent framework with a triage, diagnosis, and treatment team guided by a chief agent, augmented by a record-review module, a trainable retrieval-enhanced generation component, and an expert-guidance mechanism to ensure diagnostic rigor. Across IPDS, MAP yields a 78.10% diagnostic accuracy, a 25.10% gain over HuatuoGPT2-13B, and 10–12% higher clinical compliance than three board-certified clinicians, demonstrating strong potential for real-world inpatient pathway support. The results highlight the importance of comprehensive data integration (medical history, radiology, demographics) and structured, explainable reasoning in AI-assisted inpatient decision-making, with implications for deployment and future benchmarking in hospital settings.
Abstract
Inpatient pathways demand complex clinical decision-making based on comprehensive patient information, posing critical challenges for clinicians. Despite advancements in large language models (LLMs) in medical applications, limited research focused on artificial intelligence (AI) inpatient pathways systems, due to the lack of large-scale inpatient datasets. Moreover, existing medical benchmarks typically concentrated on medical question-answering and examinations, ignoring the multifaceted nature of clinical decision-making in inpatient settings. To address these gaps, we first developed the Inpatient Pathway Decision Support (IPDS) benchmark from the MIMIC-IV database, encompassing 51,274 cases across nine triage departments and 17 major disease categories alongside 16 standardized treatment options. Then, we proposed the Multi-Agent Inpatient Pathways (MAP) framework to accomplish inpatient pathways with three clinical agents, including a triage agent managing the patient admission, a diagnosis agent serving as the primary decision maker at the department, and a treatment agent providing treatment plans. Additionally, our MAP framework includes a chief agent overseeing the inpatient pathways to guide and promote these three clinician agents. Extensive experiments showed our MAP improved the diagnosis accuracy by 25.10% compared to the state-of-the-art LLM HuatuoGPT2-13B. It is worth noting that our MAP demonstrated significant clinical compliance, outperforming three board-certified clinicians by 10%-12%, establishing a foundation for inpatient pathways systems.
