Single-agent or Multi-agent Systems? Why Not Both?
Mingyan Gao, Yanzi Li, Banruo Liu, Yifan Yu, Phillip Wang, Ching-Yu Lin, Fan Lai
TL;DR
The paper challenges the notion that multi-agent systems (MAS) inherently outperform single-agent systems (SAS) by empirically comparing MAS and SAS across diverse agentic tasks with contemporary LLMs. It formalizes MAS execution as a dependency graph and identifies node-, edge-, and path-level defects that limit MAS benefits, especially as LLM capabilities improve. To address efficiency while preserving capability, the authors introduce a confidence-guided critical-path augmentation and two hybrid paradigms—agent routing and agent cascade—that selectively offload tasks between MAS and SAS. Their results show that hybrid approaches can yield up to 12% accuracy gains while reducing deployment costs by up to 88% in some settings, highlighting practical deployment benefits of adaptive, cost-aware agentic systems. The work provides actionable guidance for designing scalable, efficient agentic architectures as LLMs advance, with broad implications for real-world AI workflows.
Abstract
Multi-agent systems (MAS) decompose complex tasks and delegate subtasks to different large language model (LLM) agents and tools. Prior studies have reported the superior accuracy performance of MAS across diverse domains, enabled by long-horizon context tracking and error correction through role-specific agents. However, the design and deployment of MAS incur higher complexity and runtime cost compared to single-agent systems (SAS). Meanwhile, frontier LLMs, such as OpenAI-o3 and Gemini-2.5-Pro, have rapidly advanced in long-context reasoning, memory retention, and tool usage, mitigating many limitations that originally motivated MAS designs. In this paper, we conduct an extensive empirical study comparing MAS and SAS across various popular agentic applications. We find that the benefits of MAS over SAS diminish as LLM capabilities improve, and we propose efficient mechanisms to pinpoint the error-prone agent in MAS. Furthermore, the performance discrepancy between MAS and SAS motivates our design of a hybrid agentic paradigm, request cascading between MAS and SAS, to improve both efficiency and capability. Our design improves accuracy by 1.1-12% while reducing deployment costs by up to 20% across various agentic applications.
