Table of Contents
Fetching ...

Towards Effective GenAI Multi-Agent Collaboration: Design and Evaluation for Enterprise Applications

Raphael Shu, Nilaksh Das, Michelle Yuan, Monica Sunkara, Yi Zhang

TL;DR

To address enterprise-scale problem solving, the paper proposes a hierarchical multi-agent collaboration (MAC) framework using LLM-powered supervisor and specialist agents, with inter-agent communication and payload referencing, plus dynamic routing. It introduces assertion-based end-to-end benchmarking across three domains and demonstrates up to 90% goal success, with payload referencing improving code-heavy tasks by ~23% GSR and reducing communication overhead by ~27%. The evaluation also shows routing can achieve >90% accuracy with ~350 ms classification latency, and the framework generally outperforms single-agent baselines and open-source task-automation frameworks. The open benchmarking data and the end-to-end evaluation methodology offer practical guidance for enterprise deployment and future research.

Abstract

AI agents powered by large language models (LLMs) have shown strong capabilities in problem solving. Through combining many intelligent agents, multi-agent collaboration has emerged as a promising approach to tackle complex, multi-faceted problems that exceed the capabilities of single AI agents. However, designing the collaboration protocols and evaluating the effectiveness of these systems remains a significant challenge, especially for enterprise applications. This report addresses these challenges by presenting a comprehensive evaluation of coordination and routing capabilities in a novel multi-agent collaboration framework. We evaluate two key operational modes: (1) a coordination mode enabling complex task completion through parallel communication and payload referencing, and (2) a routing mode for efficient message forwarding between agents. We benchmark on a set of handcrafted scenarios from three enterprise domains, which are publicly released with the report. For coordination capabilities, we demonstrate the effectiveness of inter-agent communication and payload referencing mechanisms, achieving end-to-end goal success rates of 90%. Our analysis yields several key findings: multi-agent collaboration enhances goal success rates by up to 70% compared to single-agent approaches in our benchmarks; payload referencing improves performance on code-intensive tasks by 23%; latency can be substantially reduced with a routing mechanism that selectively bypasses agent orchestration. These findings offer valuable guidance for enterprise deployments of multi-agent systems and advance the development of scalable, efficient multi-agent collaboration frameworks.

Towards Effective GenAI Multi-Agent Collaboration: Design and Evaluation for Enterprise Applications

TL;DR

To address enterprise-scale problem solving, the paper proposes a hierarchical multi-agent collaboration (MAC) framework using LLM-powered supervisor and specialist agents, with inter-agent communication and payload referencing, plus dynamic routing. It introduces assertion-based end-to-end benchmarking across three domains and demonstrates up to 90% goal success, with payload referencing improving code-heavy tasks by ~23% GSR and reducing communication overhead by ~27%. The evaluation also shows routing can achieve >90% accuracy with ~350 ms classification latency, and the framework generally outperforms single-agent baselines and open-source task-automation frameworks. The open benchmarking data and the end-to-end evaluation methodology offer practical guidance for enterprise deployment and future research.

Abstract

AI agents powered by large language models (LLMs) have shown strong capabilities in problem solving. Through combining many intelligent agents, multi-agent collaboration has emerged as a promising approach to tackle complex, multi-faceted problems that exceed the capabilities of single AI agents. However, designing the collaboration protocols and evaluating the effectiveness of these systems remains a significant challenge, especially for enterprise applications. This report addresses these challenges by presenting a comprehensive evaluation of coordination and routing capabilities in a novel multi-agent collaboration framework. We evaluate two key operational modes: (1) a coordination mode enabling complex task completion through parallel communication and payload referencing, and (2) a routing mode for efficient message forwarding between agents. We benchmark on a set of handcrafted scenarios from three enterprise domains, which are publicly released with the report. For coordination capabilities, we demonstrate the effectiveness of inter-agent communication and payload referencing mechanisms, achieving end-to-end goal success rates of 90%. Our analysis yields several key findings: multi-agent collaboration enhances goal success rates by up to 70% compared to single-agent approaches in our benchmarks; payload referencing improves performance on code-intensive tasks by 23%; latency can be substantially reduced with a routing mechanism that selectively bypasses agent orchestration. These findings offer valuable guidance for enterprise deployments of multi-agent systems and advance the development of scalable, efficient multi-agent collaboration frameworks.

Paper Structure

This paper contains 30 sections, 1 equation, 5 figures, 17 tables.

Figures (5)

  • Figure 1: Illustration of the hierarchical agents approach for multi-agent collaboration. In a centralized hierarchy, a supervisor agent oversees and assigns tasks to specialist agents. The figure demonstrates a multi-layer hierarchy, where an agent can function as both a specialist agent and a supervisor agent.
  • Figure 2: Example of parallel agent communication. In this example, the supervisor agent simultaneously communicates with multiple agents as the tasks can be completed independently.
  • Figure 3: Example of payload referencing mechanism. In this example, the Coder agent delivers code which is then detected and tagged. The supervisor agent can then use the tag as a reference which would then be expanded to the original content for the Test agent.
  • Figure 4: Dynamic agent routing, where an incoming request can be routed directly to a specialist agent, with their messages relayed back to the user.
  • Figure 5: Overview of end-to-end assertion-based benchmarking with scenarios and assertions