Table of Contents
Fetching ...

MediHive: A Decentralized Agent Collective for Medical Reasoning

Xiaoyang Wang, Christopher C. Yang

Abstract

Large language models (LLMs) have revolutionized medical reasoning tasks, yet single-agent systems often falter on complex, interdisciplinary problems requiring robust handling of uncertainty and conflicting evidence. Multi-agent systems (MAS) leveraging LLMs enable collaborative intelligence, but prevailing centralized architectures suffer from scalability bottlenecks, single points of failure, and role confusion in resource-constrained environments. Decentralized MAS (D-MAS) promise enhanced autonomy and resilience via peer-to-peer interactions, but their application to high-stakes healthcare domains remains underexplored. We introduce MediHive, a novel decentralized multi-agent framework for medical question answering that integrates a shared memory pool with iterative fusion mechanisms. MediHive deploys LLM-based agents that autonomously self-assign specialized roles, conduct initial analyses, detect divergences through conditional evidence-based debates, and locally fuse peer insights over multiple rounds to achieve consensus. Empirically, MediHive outperforms single-LLM and centralized baselines on MedQA and PubMedQA datasets, attaining accuracies of 84.3% and 78.4%, respectively. Our work advances scalable, fault-tolerant D-MAS for medical AI, addressing key limitations of centralized designs while demonstrating superior performance in reasoning-intensive tasks.

MediHive: A Decentralized Agent Collective for Medical Reasoning

Abstract

Large language models (LLMs) have revolutionized medical reasoning tasks, yet single-agent systems often falter on complex, interdisciplinary problems requiring robust handling of uncertainty and conflicting evidence. Multi-agent systems (MAS) leveraging LLMs enable collaborative intelligence, but prevailing centralized architectures suffer from scalability bottlenecks, single points of failure, and role confusion in resource-constrained environments. Decentralized MAS (D-MAS) promise enhanced autonomy and resilience via peer-to-peer interactions, but their application to high-stakes healthcare domains remains underexplored. We introduce MediHive, a novel decentralized multi-agent framework for medical question answering that integrates a shared memory pool with iterative fusion mechanisms. MediHive deploys LLM-based agents that autonomously self-assign specialized roles, conduct initial analyses, detect divergences through conditional evidence-based debates, and locally fuse peer insights over multiple rounds to achieve consensus. Empirically, MediHive outperforms single-LLM and centralized baselines on MedQA and PubMedQA datasets, attaining accuracies of 84.3% and 78.4%, respectively. Our work advances scalable, fault-tolerant D-MAS for medical AI, addressing key limitations of centralized designs while demonstrating superior performance in reasoning-intensive tasks.

Paper Structure

This paper contains 20 sections, 1 equation, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Overview of the MediHive framework's decentralized workflow for medical question answering, illustrated with a sample query from the MedQA dataset. The process unfolds in four key steps: (A) Query initialization and autonomous role assignment among LLM agents via the shared memory pool; (B) Initial assessments by specialized agents, including preliminary diagnoses with confidence scores and reasoning; (C) Disagreement detection triggering conditional multi-round debates, followed by local fusion of insights; and (D) Consensus aggregation through confidence-weighted voting, culminating in the final compiled reasoning and output.
  • Figure 2: Illustrative output of the Role Assignment phase for the sample MedQA query from Fig. \ref{['fig:framework']}, showing initial proposals and peer-aware refinements posted to $\mathcal{M}$.
  • Figure 3: Comparison of centralized and proposed MediHive framework, highlighting coordination, resilience, and adaptability.
  • Figure 4: Performance on MedQA and PubMedQA datasets as a function of the number of collaborating agents ($N$). The optimal accuracy for both datasets is achieved with $N=5$.