MAC: Masked Agent Collaboration Boosts Large Language Model Medical Decision-Making
Zhihao Peng, Liuxin Bao, Yixuan Yuan
TL;DR
MAC addresses the fragility of multi-agent LLM collaborations in medical decision making by combining Pareto-frontier based agent construction with cross-consistency maximization. It selects a Pareto-optimal, diverse set of agents using four metrics and a coverage strategy, then iteratively masks the least consistent agent per layer to enable adaptive progressive propagation. The framework generates robust outputs by aggregating unmasked agents across layers, reducing misinformation and computational waste. Experiments on NEJMQA, MMLUPH, and MedQA show MAC achieving substantial gains over baselines and existing MAS approaches, including open-access models operating at 14–32B parameters, with strong ablation support for PAC and CCM components.
Abstract
Large language models (LLMs) have proven effective in artificial intelligence, where the multi-agent system (MAS) holds considerable promise for healthcare development by achieving the collaboration of LLMs. However, the absence of a systematic pipeline for agent construction and the rigidity of static collaboration patterns render current MAS-based models vulnerable to collaboration failures, resulting in substantial performance degradation in medical decision-making scenarios. To this end, we propose a novel Masked Agent Collaboration (MAC) framework that harnesses Pareto-optimal agent construction and cross-consistency maximization mechanisms to achieve adaptive progressive propagation of collaborative information, boosting the medical decision-making capacity. Specifically, we first conduct a Pareto-frontier factors analysis towards the LLMs pool to consider their key factors, including the model size, inference time, diversity score, and throughput ratio, where we calculate the similarity between pairwise outputs within an LLM to derive its diversity score. Beyond this analysis, we enable the identification of Pareto-optimal models that balance efficiency and capability, which are subsequently selected as collaborative agents to consider the fundamental trade-offs inherent in practical LLM deployment. Afterward, we measure the pairwise similarity between the outputs from collaborative agents to determine their cross-consistency values, subsequently masking out the agent with the lowest cross-consistency value to eliminate the output that is likely semantically inconsistent. Finally, we conduct collaboration of agents by achieving adaptive progressive propagation, where each agent aggregates the outputs of unmasked agents from the previous layer as its input to generate the corresponding output via prompt engineering.
