Table of Contents
Fetching ...

Medchain: Bridging the Gap Between LLM Agents and Clinical Practice with Interactive Sequence

Jie Liu, Wenxuan Wang, Zizhan Ma, Guolin Huang, Yihang SU, Kao-Jung Chang, Wenting Chen, Haoliang Li, Linlin Shen, Michael Lyu

TL;DR

MedChain addresses the gap between existing medical benchmarks and real-world clinical decision making by introducing a large-scale, five-stage CDM benchmark with personalization, interactivity, and sequentiality. It further proposes MedChain-Agent, a multi-agent framework with a Feedback loop and MedCase-RAG for dynamic case-based retrieval and iterative refinement. Experimental results show MedChain-Agent outperforms baselines and generalizes across base LLMs, highlighting improved performance in complex stages like history-taking and examination. This work enables more realistic evaluation and development of AI-driven clinical decision support, with potential to improve patient care through better emulation of real-world workflows.

Abstract

Clinical decision making (CDM) is a complex, dynamic process crucial to healthcare delivery, yet it remains a significant challenge for artificial intelligence systems. While Large Language Model (LLM)-based agents have been tested on general medical knowledge using licensing exams and knowledge question-answering tasks, their performance in the CDM in real-world scenarios is limited due to the lack of comprehensive testing datasets that mirror actual medical practice. To address this gap, we present MedChain, a dataset of 12,163 clinical cases that covers five key stages of clinical workflow. MedChain distinguishes itself from existing benchmarks with three key features of real-world clinical practice: personalization, interactivity, and sequentiality. Further, to tackle real-world CDM challenges, we also propose MedChain-Agent, an AI system that integrates a feedback mechanism and a MCase-RAG module to learn from previous cases and adapt its responses. MedChain-Agent demonstrates remarkable adaptability in gathering information dynamically and handling sequential clinical tasks, significantly outperforming existing approaches.

Medchain: Bridging the Gap Between LLM Agents and Clinical Practice with Interactive Sequence

TL;DR

MedChain addresses the gap between existing medical benchmarks and real-world clinical decision making by introducing a large-scale, five-stage CDM benchmark with personalization, interactivity, and sequentiality. It further proposes MedChain-Agent, a multi-agent framework with a Feedback loop and MedCase-RAG for dynamic case-based retrieval and iterative refinement. Experimental results show MedChain-Agent outperforms baselines and generalizes across base LLMs, highlighting improved performance in complex stages like history-taking and examination. This work enables more realistic evaluation and development of AI-driven clinical decision support, with potential to improve patient care through better emulation of real-world workflows.

Abstract

Clinical decision making (CDM) is a complex, dynamic process crucial to healthcare delivery, yet it remains a significant challenge for artificial intelligence systems. While Large Language Model (LLM)-based agents have been tested on general medical knowledge using licensing exams and knowledge question-answering tasks, their performance in the CDM in real-world scenarios is limited due to the lack of comprehensive testing datasets that mirror actual medical practice. To address this gap, we present MedChain, a dataset of 12,163 clinical cases that covers five key stages of clinical workflow. MedChain distinguishes itself from existing benchmarks with three key features of real-world clinical practice: personalization, interactivity, and sequentiality. Further, to tackle real-world CDM challenges, we also propose MedChain-Agent, an AI system that integrates a feedback mechanism and a MCase-RAG module to learn from previous cases and adapt its responses. MedChain-Agent demonstrates remarkable adaptability in gathering information dynamically and handling sequential clinical tasks, significantly outperforming existing approaches.

Paper Structure

This paper contains 38 sections, 29 figures, 6 tables.

Figures (29)

  • Figure 1: Demonstration of error propagation of CDM in MedChain. Starting with 2,362 initial cases, the diagram illustrates how diagnostic errors cascade through five clinical stages. Cases with incorrect diagnoses carry forward problematic information to subsequent stages, leading to a cumulative decrease in accuracy. After completing the treatment phase, we count cases that maintain correctness through each consecutive phase up to the each stage. Our MedChain-Agent achieves best performance in CDM comparing with other SOTA methods.
  • Figure 2: MedChain Pipeline. The MedChain is composed of a sequential medical process, including specialty referral, history-taking , examination, diagnosis, and treatment.
  • Figure 3: MedChain-Agent framework. Depicts a cyclical feedback medical multi-task system, where decisions are supported by retrieving similar past cases from a medical database.
  • Figure 4: Case Report "77_Ovarian Carcinoid with Mature Cystic Teratoma: A Case Report."
  • Figure 5: Case Report Chinese Version with Corresponding Medical Imaging. "77_Ovarian Carcinoid with Mature Cystic Teratoma: A Case Report." (a), (b) and (c) Medical Imaging. (d) Chinese version.
  • ...and 24 more figures