Table of Contents
Fetching ...

MAC: Masked Agent Collaboration Boosts Large Language Model Medical Decision-Making

Zhihao Peng, Liuxin Bao, Yixuan Yuan

TL;DR

MAC addresses the fragility of multi-agent LLM collaborations in medical decision making by combining Pareto-frontier based agent construction with cross-consistency maximization. It selects a Pareto-optimal, diverse set of agents using four metrics and a coverage strategy, then iteratively masks the least consistent agent per layer to enable adaptive progressive propagation. The framework generates robust outputs by aggregating unmasked agents across layers, reducing misinformation and computational waste. Experiments on NEJMQA, MMLUPH, and MedQA show MAC achieving substantial gains over baselines and existing MAS approaches, including open-access models operating at 14–32B parameters, with strong ablation support for PAC and CCM components.

Abstract

Large language models (LLMs) have proven effective in artificial intelligence, where the multi-agent system (MAS) holds considerable promise for healthcare development by achieving the collaboration of LLMs. However, the absence of a systematic pipeline for agent construction and the rigidity of static collaboration patterns render current MAS-based models vulnerable to collaboration failures, resulting in substantial performance degradation in medical decision-making scenarios. To this end, we propose a novel Masked Agent Collaboration (MAC) framework that harnesses Pareto-optimal agent construction and cross-consistency maximization mechanisms to achieve adaptive progressive propagation of collaborative information, boosting the medical decision-making capacity. Specifically, we first conduct a Pareto-frontier factors analysis towards the LLMs pool to consider their key factors, including the model size, inference time, diversity score, and throughput ratio, where we calculate the similarity between pairwise outputs within an LLM to derive its diversity score. Beyond this analysis, we enable the identification of Pareto-optimal models that balance efficiency and capability, which are subsequently selected as collaborative agents to consider the fundamental trade-offs inherent in practical LLM deployment. Afterward, we measure the pairwise similarity between the outputs from collaborative agents to determine their cross-consistency values, subsequently masking out the agent with the lowest cross-consistency value to eliminate the output that is likely semantically inconsistent. Finally, we conduct collaboration of agents by achieving adaptive progressive propagation, where each agent aggregates the outputs of unmasked agents from the previous layer as its input to generate the corresponding output via prompt engineering.

MAC: Masked Agent Collaboration Boosts Large Language Model Medical Decision-Making

TL;DR

MAC addresses the fragility of multi-agent LLM collaborations in medical decision making by combining Pareto-frontier based agent construction with cross-consistency maximization. It selects a Pareto-optimal, diverse set of agents using four metrics and a coverage strategy, then iteratively masks the least consistent agent per layer to enable adaptive progressive propagation. The framework generates robust outputs by aggregating unmasked agents across layers, reducing misinformation and computational waste. Experiments on NEJMQA, MMLUPH, and MedQA show MAC achieving substantial gains over baselines and existing MAS approaches, including open-access models operating at 14–32B parameters, with strong ablation support for PAC and CCM components.

Abstract

Large language models (LLMs) have proven effective in artificial intelligence, where the multi-agent system (MAS) holds considerable promise for healthcare development by achieving the collaboration of LLMs. However, the absence of a systematic pipeline for agent construction and the rigidity of static collaboration patterns render current MAS-based models vulnerable to collaboration failures, resulting in substantial performance degradation in medical decision-making scenarios. To this end, we propose a novel Masked Agent Collaboration (MAC) framework that harnesses Pareto-optimal agent construction and cross-consistency maximization mechanisms to achieve adaptive progressive propagation of collaborative information, boosting the medical decision-making capacity. Specifically, we first conduct a Pareto-frontier factors analysis towards the LLMs pool to consider their key factors, including the model size, inference time, diversity score, and throughput ratio, where we calculate the similarity between pairwise outputs within an LLM to derive its diversity score. Beyond this analysis, we enable the identification of Pareto-optimal models that balance efficiency and capability, which are subsequently selected as collaborative agents to consider the fundamental trade-offs inherent in practical LLM deployment. Afterward, we measure the pairwise similarity between the outputs from collaborative agents to determine their cross-consistency values, subsequently masking out the agent with the lowest cross-consistency value to eliminate the output that is likely semantically inconsistent. Finally, we conduct collaboration of agents by achieving adaptive progressive propagation, where each agent aggregates the outputs of unmasked agents from the previous layer as its input to generate the corresponding output via prompt engineering.

Paper Structure

This paper contains 15 sections, 8 equations, 10 figures, 6 tables, 1 algorithm.

Figures (10)

  • Figure 1: Illustration of (a) single LLM and MAS-based framework architectures, including (b) Debate estornell2024multi, (c) MoA wang2025mixtureofagents, and (d) MAC (Ours). Different colors represent different agents with responses, where $\textbf{L}_i$ and $\textbf{L}_A$ denote the $i$-th agent and the agent for aggregation, respectively, and $K$ denotes the number of agents in the MAS.
  • Figure 2: Comparisons of different LLMs and MAS-based models on diverse evaluation dimensions. Comparisons of the ACC (in percentage), occupied memory (in $10^4$ MB), and running time (in seconds) of different models on NEJMQA. The diameter of the bubble is proportional to the running time.
  • Figure 3: Illustration of the proposed MAC framework, where we achieve the Pareto-optimal agent construction via Pareto-frontier factors analysis and the adaptive progressive propagation via cross-consistency maximization mechanism. It significantly reduces the inconsistency of concatenated outputs while ensuring each LLM generates outputs based exclusively on outputs of unmasked LLMs from the previous layer as a contextual reference rather than considering entire outputs.
  • Figure 4: An overview of the adaptive progressive propagation of MAC, where MAC fosters collaboration through the adaptive exclusion of inconsistent outputs to augment medical decision-making capabilities. The green font highlights correct details, while the red font indicates hallucinations or inaccuracies in LLM responses, and the area covered by the diagonal dashed line denotes that the agent is being masked.
  • Figure 5: Specified prompts for output aggregation and used datasets.
  • ...and 5 more figures