Table of Contents
Fetching ...

MARCO: Multi-Agent Real-time Chat Orchestration

Anubhav Shrimal, Stanley Kanagaraj, Kriti Biswas, Swarnalatha Raghuraman, Anish Nediyanchath, Yi Zhang, Promod Yenigalla

TL;DR

MARCO addresses key challenges in utilizing LLMs for complex, multi-step task execution in a production environment by incorporating robust guardrails to steer LLM behavior, validate outputs, and recover from errors that stem from inconsistent output formatting, function and parameter hallucination, and lack of domain knowledge.

Abstract

Large language model advancements have enabled the development of multi-agent frameworks to tackle complex, real-world problems such as to automate tasks that require interactions with diverse tools, reasoning, and human collaboration. We present MARCO, a Multi-Agent Real-time Chat Orchestration framework for automating tasks using LLMs. MARCO addresses key challenges in utilizing LLMs for complex, multi-step task execution. It incorporates robust guardrails to steer LLM behavior, validate outputs, and recover from errors that stem from inconsistent output formatting, function and parameter hallucination, and lack of domain knowledge. Through extensive experiments we demonstrate MARCO's superior performance with 94.48% and 92.74% accuracy on task execution for Digital Restaurant Service Platform conversations and Retail conversations datasets respectively along with 44.91% improved latency and 33.71% cost reduction. We also report effects of guardrails in performance gain along with comparisons of various LLM models, both open-source and proprietary. The modular and generic design of MARCO allows it to be adapted for automating tasks across domains and to execute complex usecases through multi-turn interactions.

MARCO: Multi-Agent Real-time Chat Orchestration

TL;DR

MARCO addresses key challenges in utilizing LLMs for complex, multi-step task execution in a production environment by incorporating robust guardrails to steer LLM behavior, validate outputs, and recover from errors that stem from inconsistent output formatting, function and parameter hallucination, and lack of domain knowledge.

Abstract

Large language model advancements have enabled the development of multi-agent frameworks to tackle complex, real-world problems such as to automate tasks that require interactions with diverse tools, reasoning, and human collaboration. We present MARCO, a Multi-Agent Real-time Chat Orchestration framework for automating tasks using LLMs. MARCO addresses key challenges in utilizing LLMs for complex, multi-step task execution. It incorporates robust guardrails to steer LLM behavior, validate outputs, and recover from errors that stem from inconsistent output formatting, function and parameter hallucination, and lack of domain knowledge. Through extensive experiments we demonstrate MARCO's superior performance with 94.48% and 92.74% accuracy on task execution for Digital Restaurant Service Platform conversations and Retail conversations datasets respectively along with 44.91% improved latency and 33.71% cost reduction. We also report effects of guardrails in performance gain along with comparisons of various LLM models, both open-source and proprietary. The modular and generic design of MARCO allows it to be adapted for automating tasks across domains and to execute complex usecases through multi-turn interactions.

Paper Structure

This paper contains 22 sections, 1 equation, 6 figures, 5 tables, 1 algorithm.

Figures (6)

  • Figure 1: Multi-Agent Conversation Flow in MARCO Framework. This diagram illustrates the complex interactions within the MARCO system as it addresses a user's query about declining sales. It showcases MARCO's orchestration of multiple components including the MARCO Base agent, specialized task agents, deterministic multi-step workflows, data stores, and external tools/APIs. The figure demonstrates MARCO's capability to manage multi-turn communications with both the user and various system components, highlighting its process of task decomposition, information gathering, analysis, and action execution in response to real-world business scenarios.
  • Figure 2: Multi-Agents Hierarchy example for Digital Restaurant Service Platform dataset. A directed acyclic graph in which each agent has it's own Task Execution Procedure (TEP) steps, functions and dependent Sub-Task Agents.
  • Figure 3: Cost ($) of MARCO components for every 5000 requests using various LLMs.
  • Figure 4: Impact of Reflection Prompts on Guardrail Error Recurrence During Retries. This graph compares the number of guardrail errors persisting across multiple retry attempts, with and without the use of reflection prompts. It demonstrates that incorporating reflection prompts significantly reduces error recurrence, typically resolving issues within the first retry. In contrast, retrying without reflection shows a gradual decrease in errors but fails to eliminate them entirely even after four attempts.
  • Figure 5: Effect of temperature hyper-parameter on MARS performance.
  • ...and 1 more figures