Towards Effective GenAI Multi-Agent Collaboration: Design and Evaluation for Enterprise Applications
Raphael Shu, Nilaksh Das, Michelle Yuan, Monica Sunkara, Yi Zhang
TL;DR
To address enterprise-scale problem solving, the paper proposes a hierarchical multi-agent collaboration (MAC) framework using LLM-powered supervisor and specialist agents, with inter-agent communication and payload referencing, plus dynamic routing. It introduces assertion-based end-to-end benchmarking across three domains and demonstrates up to 90% goal success, with payload referencing improving code-heavy tasks by ~23% GSR and reducing communication overhead by ~27%. The evaluation also shows routing can achieve >90% accuracy with ~350 ms classification latency, and the framework generally outperforms single-agent baselines and open-source task-automation frameworks. The open benchmarking data and the end-to-end evaluation methodology offer practical guidance for enterprise deployment and future research.
Abstract
AI agents powered by large language models (LLMs) have shown strong capabilities in problem solving. Through combining many intelligent agents, multi-agent collaboration has emerged as a promising approach to tackle complex, multi-faceted problems that exceed the capabilities of single AI agents. However, designing the collaboration protocols and evaluating the effectiveness of these systems remains a significant challenge, especially for enterprise applications. This report addresses these challenges by presenting a comprehensive evaluation of coordination and routing capabilities in a novel multi-agent collaboration framework. We evaluate two key operational modes: (1) a coordination mode enabling complex task completion through parallel communication and payload referencing, and (2) a routing mode for efficient message forwarding between agents. We benchmark on a set of handcrafted scenarios from three enterprise domains, which are publicly released with the report. For coordination capabilities, we demonstrate the effectiveness of inter-agent communication and payload referencing mechanisms, achieving end-to-end goal success rates of 90%. Our analysis yields several key findings: multi-agent collaboration enhances goal success rates by up to 70% compared to single-agent approaches in our benchmarks; payload referencing improves performance on code-intensive tasks by 23%; latency can be substantially reduced with a routing mechanism that selectively bypasses agent orchestration. These findings offer valuable guidance for enterprise deployments of multi-agent systems and advance the development of scalable, efficient multi-agent collaboration frameworks.
