MDTeamGPT: A Self-Evolving LLM-based Multi-Agent Framework for Multi-Disciplinary Team Medical Consultation

Kai Chen; Xinfeng Li; Tianpei Yang; Hewei Wang; Wei Dong; Yang Gao

MDTeamGPT: A Self-Evolving LLM-based Multi-Agent Framework for Multi-Disciplinary Team Medical Consultation

Kai Chen, Xinfeng Li, Tianpei Yang, Hewei Wang, Wei Dong, Yang Gao

TL;DR

MDTeamGPT tackles MDT medical consultation challenges by introducing a self-evolving, multi-agent framework that uses consensus aggregation and a residual discussion structure. It employs CorrectKB and ChainKB to accumulate validated and reflective diagnostic experiences, enabling continual improvement in reasoning and accuracy. Experimental results on MedQA and PubMedQA show strong zero-shot performance and cross-dataset generalization, underscoring the framework's robustness and transferability. The work highlights practical implications for scalable, privacy-conscious MDT support and points to avenues for future enhancement and real-world deployment.

Abstract

Large Language Models (LLMs) have made significant progress in various fields. However, challenges remain in Multi-Disciplinary Team (MDT) medical consultations. Current research enhances reasoning through role assignment, task decomposition, and accumulation of medical experience. Multi-role collaboration in MDT consultations often results in excessively long dialogue histories. This increases the model's cognitive burden and degrades both efficiency and accuracy. Some methods only store treatment histories. They do not extract effective experience or reflect on errors. This limits knowledge generalization and system evolution. We propose a multi-agent MDT medical consultation framework based on LLMs to address these issues. Our framework uses consensus aggregation and a residual discussion structure for multi-round consultations. It also employs a Correct Answer Knowledge Base (CorrectKB) and a Chain-of-Thought Knowledge Base (ChainKB) to accumulate consultation experience. These mechanisms enable the framework to evolve and continually improve diagnosis rationality and accuracy. Experimental results on the MedQA and PubMedQA datasets demonstrate that our framework achieves accuracies of 90.1% and 83.9%, respectively, and that the constructed knowledge bases generalize effectively across test sets from both datasets.

MDTeamGPT: A Self-Evolving LLM-based Multi-Agent Framework for Multi-Disciplinary Team Medical Consultation

TL;DR

Abstract

MDTeamGPT: A Self-Evolving LLM-based Multi-Agent Framework for Multi-Disciplinary Team Medical Consultation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)