Table of Contents
Fetching ...

Multi-Agent Medical Decision Consensus Matrix System: An Intelligent Collaborative Framework for Oncology MDT Consultations

Xudong Han, Xianglun Gao, Xiaoyi Qu, Zhenyu Yu

TL;DR

The study tackles the lack of quantitative consensus and traceability in oncology MDT decisions by introducing a seven-role multi-agent system that builds a structured consensus matrix and uses reinforcement learning to optimize treatment recommendations. It combines role-specific LLMs with a retrieval-augmented evidence pipeline, GRADE-based evaluation, and explicit evidence chains to support decisions. Across five benchmark datasets, the approach achieves higher accuracy, a stronger consensus coefficient ($W$), and superior expert validation compared with baselines, demonstrating the value of formal consensus modelling and evidence-grounded explainability in medical decision support. While promising, the work acknowledges biases, guideline update cadence, multimorbidity challenges, and high resource demands, underscoring the need for human oversight in clinical deployment.

Abstract

Multidisciplinary team (MDT) consultations are the gold standard for cancer care decision-making, yet current practice lacks structured mechanisms for quantifying consensus and ensuring decision traceability. We introduce a Multi-Agent Medical Decision Consensus Matrix System that deploys seven specialized large language model agents, including an oncologist, a radiologist, a nurse, a psychologist, a patient advocate, a nutritionist and a rehabilitation therapist, to simulate realistic MDT workflows. The framework incorporates a mathematically grounded consensus matrix that uses Kendall's coefficient of concordance to objectively assess agreement. To further enhance treatment recommendation quality and consensus efficiency, the system integrates reinforcement learning methods, including Q-Learning, PPO and DQN. Evaluation across five medical benchmarks (MedQA, PubMedQA, DDXPlus, MedBullets and SymCat) shows substantial gains over existing approaches, achieving an average accuracy of 87.5% compared with 83.8% for the strongest baseline, a consensus achievement rate of 89.3% and a mean Kendall's W of 0.823. Expert reviewers rated the clinical appropriateness of system outputs at 8.9/10. The system guarantees full evidence traceability through mandatory citations of clinical guidelines and peer-reviewed literature, following GRADE principles. This work advances medical AI by providing structured consensus measurement, role-specialized multi-agent collaboration and evidence-based explainability to improve the quality and efficiency of clinical decision-making.

Multi-Agent Medical Decision Consensus Matrix System: An Intelligent Collaborative Framework for Oncology MDT Consultations

TL;DR

The study tackles the lack of quantitative consensus and traceability in oncology MDT decisions by introducing a seven-role multi-agent system that builds a structured consensus matrix and uses reinforcement learning to optimize treatment recommendations. It combines role-specific LLMs with a retrieval-augmented evidence pipeline, GRADE-based evaluation, and explicit evidence chains to support decisions. Across five benchmark datasets, the approach achieves higher accuracy, a stronger consensus coefficient (), and superior expert validation compared with baselines, demonstrating the value of formal consensus modelling and evidence-grounded explainability in medical decision support. While promising, the work acknowledges biases, guideline update cadence, multimorbidity challenges, and high resource demands, underscoring the need for human oversight in clinical deployment.

Abstract

Multidisciplinary team (MDT) consultations are the gold standard for cancer care decision-making, yet current practice lacks structured mechanisms for quantifying consensus and ensuring decision traceability. We introduce a Multi-Agent Medical Decision Consensus Matrix System that deploys seven specialized large language model agents, including an oncologist, a radiologist, a nurse, a psychologist, a patient advocate, a nutritionist and a rehabilitation therapist, to simulate realistic MDT workflows. The framework incorporates a mathematically grounded consensus matrix that uses Kendall's coefficient of concordance to objectively assess agreement. To further enhance treatment recommendation quality and consensus efficiency, the system integrates reinforcement learning methods, including Q-Learning, PPO and DQN. Evaluation across five medical benchmarks (MedQA, PubMedQA, DDXPlus, MedBullets and SymCat) shows substantial gains over existing approaches, achieving an average accuracy of 87.5% compared with 83.8% for the strongest baseline, a consensus achievement rate of 89.3% and a mean Kendall's W of 0.823. Expert reviewers rated the clinical appropriateness of system outputs at 8.9/10. The system guarantees full evidence traceability through mandatory citations of clinical guidelines and peer-reviewed literature, following GRADE principles. This work advances medical AI by providing structured consensus measurement, role-specialized multi-agent collaboration and evidence-based explainability to improve the quality and efficiency of clinical decision-making.

Paper Structure

This paper contains 31 sections, 19 equations, 6 figures, 5 tables, 2 algorithms.

Figures (6)

  • Figure 1: Motivation. Traditional tumor multidisciplinary team (MDT) meetings rely on unstructured discussion among human experts, often lacking quantitative consensus measurement and traceable evidence for treatment decisions. Our framework replaces each MDT role with a specialized large language model agent, aggregates their structured preferences into a mathematically grounded consensus matrix, and uses reinforcement learning together with guideline- and literature-based evidence retrieval to produce oncology treatment recommendations with quantified agreement and full evidence traceability.
  • Figure 2: Multi-Agent Medical Decision Consensus Matrix System Architecture. The system integrates specialized medical role agents, consensus matrix computation, reinforcement learning optimization, and evidence-based explainability mechanisms with specific data flow dimensions and processing stages.
  • Figure 3: Consensus Matrix Performance Analysis: (a) Distribution of Kendall's W coefficients across all evaluation datasets, (b) Consensus achievement rate vs. case complexity scoring, (c) Convergence analysis showing rounds required to achieve consensus ($W > 0.7$) for different clinical scenarios.
  • Figure 4: Ablation Study: Component Contribution Analysis. Each system component demonstrates essential contribution to overall performance, with the complete system achieving 87.5% accuracy and 0.823 consensus coefficient. Removal of any core component results in substantial performance degradation, confirming the necessity of our integrated multi-agent consensus matrix architecture.
  • Figure 5: Computational Performance Analysis. Our method achieves balanced computational efficiency with 45.2s processing time per case and 78.3% GPU utilization, outperforming comparable multi-agent systems while maintaining high throughput. Despite higher memory requirements due to seven specialized agents, the system demonstrates practical scalability for clinical deployment with 79.6 cases per hour processing capacity.
  • ...and 1 more figures