Table of Contents
Fetching ...

MDTeamGPT: A Self-Evolving LLM-based Multi-Agent Framework for Multi-Disciplinary Team Medical Consultation

Kai Chen, Xinfeng Li, Tianpei Yang, Hewei Wang, Wei Dong, Yang Gao

TL;DR

MDTeamGPT tackles MDT medical consultation challenges by introducing a self-evolving, multi-agent framework that uses consensus aggregation and a residual discussion structure. It employs CorrectKB and ChainKB to accumulate validated and reflective diagnostic experiences, enabling continual improvement in reasoning and accuracy. Experimental results on MedQA and PubMedQA show strong zero-shot performance and cross-dataset generalization, underscoring the framework's robustness and transferability. The work highlights practical implications for scalable, privacy-conscious MDT support and points to avenues for future enhancement and real-world deployment.

Abstract

Large Language Models (LLMs) have made significant progress in various fields. However, challenges remain in Multi-Disciplinary Team (MDT) medical consultations. Current research enhances reasoning through role assignment, task decomposition, and accumulation of medical experience. Multi-role collaboration in MDT consultations often results in excessively long dialogue histories. This increases the model's cognitive burden and degrades both efficiency and accuracy. Some methods only store treatment histories. They do not extract effective experience or reflect on errors. This limits knowledge generalization and system evolution. We propose a multi-agent MDT medical consultation framework based on LLMs to address these issues. Our framework uses consensus aggregation and a residual discussion structure for multi-round consultations. It also employs a Correct Answer Knowledge Base (CorrectKB) and a Chain-of-Thought Knowledge Base (ChainKB) to accumulate consultation experience. These mechanisms enable the framework to evolve and continually improve diagnosis rationality and accuracy. Experimental results on the MedQA and PubMedQA datasets demonstrate that our framework achieves accuracies of 90.1% and 83.9%, respectively, and that the constructed knowledge bases generalize effectively across test sets from both datasets.

MDTeamGPT: A Self-Evolving LLM-based Multi-Agent Framework for Multi-Disciplinary Team Medical Consultation

TL;DR

MDTeamGPT tackles MDT medical consultation challenges by introducing a self-evolving, multi-agent framework that uses consensus aggregation and a residual discussion structure. It employs CorrectKB and ChainKB to accumulate validated and reflective diagnostic experiences, enabling continual improvement in reasoning and accuracy. Experimental results on MedQA and PubMedQA show strong zero-shot performance and cross-dataset generalization, underscoring the framework's robustness and transferability. The work highlights practical implications for scalable, privacy-conscious MDT support and points to avenues for future enhancement and real-world deployment.

Abstract

Large Language Models (LLMs) have made significant progress in various fields. However, challenges remain in Multi-Disciplinary Team (MDT) medical consultations. Current research enhances reasoning through role assignment, task decomposition, and accumulation of medical experience. Multi-role collaboration in MDT consultations often results in excessively long dialogue histories. This increases the model's cognitive burden and degrades both efficiency and accuracy. Some methods only store treatment histories. They do not extract effective experience or reflect on errors. This limits knowledge generalization and system evolution. We propose a multi-agent MDT medical consultation framework based on LLMs to address these issues. Our framework uses consensus aggregation and a residual discussion structure for multi-round consultations. It also employs a Correct Answer Knowledge Base (CorrectKB) and a Chain-of-Thought Knowledge Base (ChainKB) to accumulate consultation experience. These mechanisms enable the framework to evolve and continually improve diagnosis rationality and accuracy. Experimental results on the MedQA and PubMedQA datasets demonstrate that our framework achieves accuracies of 90.1% and 83.9%, respectively, and that the constructed knowledge bases generalize effectively across test sets from both datasets.

Paper Structure

This paper contains 27 sections, 2 equations, 7 figures, 4 tables, 2 algorithms.

Figures (7)

  • Figure 1: The lead physician consolidates and refines the discussion outcomes from the current round of agents by categorizing them into four detailed categories: consistency, conflict, independence, and integration, thereby enhancing the overall diagnostic clarity.
  • Figure 2: Overview of our MDTeamGPT medical consultations framework: (A) arranging strategically specialist doctors based on the patient’s specific condition; (B) orchestrating multi-round collaborative consultations with organized clinical information; (C) summarizing and outputting the final diagnostic and treatment recommendations.
  • Figure 3: Diagram of the residual discussion structure in the MDT medical consultation process.
  • Figure 4: Demonstration of the self-evolving capability of our proposed MDTeamGPT framework. As the number of consultation rounds increases, the framework progressively refines its diagnostic accuracy. (A) presents the performance of our MDTeamGPT tested on MedQA, and (B) illustrates corresponding results on PubMedQA.
  • Figure 5: Accuracy enhancements achieved by MDTeamGPT over single-agent baselines across multiple LLMs. (A) depicts the performance boost on MedQA, and (B) shows analogous gains on PubMedQA. These findings highlight the framework’s adaptability across language models and robust efficacy in medical consultation scenarios.
  • ...and 2 more figures