Table of Contents
Fetching ...

Agent Context Protocols Enhance Collective Inference

Devansh Bhardwaj, Arjun Beniwal, Shreyas Chaudhari, Ashwin Kalyan, Tanmay Rajpurohit, Karthik R. Narasimhan, Ameet Deshpande, Vishvak Murahari

TL;DR

This work addresses interoperability gaps in multi-agent systems by introducing Agent Context Protocols (ACPs), which pair a persistent Execution Blueprint DAG with standardized inter-agent messages to enable fault-tolerant, long-horizon collective inference. ACPs formalize agent capabilities, task decomposition, and tool orchestration within a domain-agnostic framework, leveraging messages like AGENT_REQUEST, AGENT_RESPONSE, and ASSISTANCE_REQUEST and a suite of standardized error codes. Empirical results show state-of-the-art performance on AssistantBench (28.3% accuracy with domain tools) and best-in-class multimodal report generation, along with ablations highlighting the importance of coordination and fault tolerance. The approach offers a modular, extensible foundation for rapid construction of generalist, domain-adaptable agents with robust error handling and interoperability across diverse tasks.

Abstract

AI agents have become increasingly adept at complex tasks such as coding, reasoning, and multimodal understanding. However, building generalist systems requires moving beyond individual agents to collective inference -- a paradigm where multi-agent systems with diverse, task-specialized agents complement one another through structured communication and collaboration. Today, coordination is usually handled with imprecise, ad-hoc natural language, which limits complex interaction and hinders interoperability with domain-specific agents. We introduce Agent context protocols (ACPs): a domain- and agent-agnostic family of structured protocols for agent-agent communication, coordination, and error handling. ACPs combine (i) persistent execution blueprints -- explicit dependency graphs that store intermediate agent outputs -- with (ii) standardized message schemas, enabling robust and fault-tolerant multi-agent collective inference. ACP-powered generalist systems reach state-of-the-art performance: 28.3 % accuracy on AssistantBench for long-horizon web assistance and best-in-class multimodal technical reports, outperforming commercial AI systems in human evaluation. ACPs are highly modular and extensible, allowing practitioners to build top-tier generalist agents quickly.

Agent Context Protocols Enhance Collective Inference

TL;DR

This work addresses interoperability gaps in multi-agent systems by introducing Agent Context Protocols (ACPs), which pair a persistent Execution Blueprint DAG with standardized inter-agent messages to enable fault-tolerant, long-horizon collective inference. ACPs formalize agent capabilities, task decomposition, and tool orchestration within a domain-agnostic framework, leveraging messages like AGENT_REQUEST, AGENT_RESPONSE, and ASSISTANCE_REQUEST and a suite of standardized error codes. Empirical results show state-of-the-art performance on AssistantBench (28.3% accuracy with domain tools) and best-in-class multimodal report generation, along with ablations highlighting the importance of coordination and fault tolerance. The approach offers a modular, extensible foundation for rapid construction of generalist, domain-adaptable agents with robust error handling and interoperability across diverse tasks.

Abstract

AI agents have become increasingly adept at complex tasks such as coding, reasoning, and multimodal understanding. However, building generalist systems requires moving beyond individual agents to collective inference -- a paradigm where multi-agent systems with diverse, task-specialized agents complement one another through structured communication and collaboration. Today, coordination is usually handled with imprecise, ad-hoc natural language, which limits complex interaction and hinders interoperability with domain-specific agents. We introduce Agent context protocols (ACPs): a domain- and agent-agnostic family of structured protocols for agent-agent communication, coordination, and error handling. ACPs combine (i) persistent execution blueprints -- explicit dependency graphs that store intermediate agent outputs -- with (ii) standardized message schemas, enabling robust and fault-tolerant multi-agent collective inference. ACP-powered generalist systems reach state-of-the-art performance: 28.3 % accuracy on AssistantBench for long-horizon web assistance and best-in-class multimodal technical reports, outperforming commercial AI systems in human evaluation. ACPs are highly modular and extensible, allowing practitioners to build top-tier generalist agents quickly.

Paper Structure

This paper contains 42 sections, 5 figures, 4 tables.

Figures (5)

  • Figure 1: An illustrative overview of our ACP-based system. A complex task $T$ is decomposed into sub-tasks (each handled by an agent $A_i$, having capabilities $o_i$---i.e., specialized tools) and compiled into a DAG-based Execution Blueprint. ACPs then coordinate these specialized agents via structured communication and robust error handling, ensuring each sub-task faithfully adheres to the Execution Blueprint.
  • Figure 2: Sample pages from a multi-modal technical report generated by our ACP-based framework. Textual content, data visualizations, and structured references are combined into a cohesive document spanning multiple sections and > 30 pages. All reports can be found in Appendix \ref{['appendix:report_samples']}.
  • Figure 3: Heatmap of the average human ratings (0–5) across six key dimensions (D1–D6) and the overall average for multi-modal report generation. “Ours”: ACP (top row) outperforms Gemini and Perplexity, showing particularly strong gains in Coverage (D1) and Presentation Quality (D6).
  • Figure 4: Execution timeline in response to a travel planning query. This figure illustrates the execution timeline for a complex query, depicting parallel execution of independent agents and sequential execution of dependent agents.
  • Figure 5: Execution Blueprint This diagram illustrates a structured Execution Blueprint visualization, where specialized agents—each executing specific Tool calls (e.g., Reddit, News, Tripadvisor, WeatherAPI, Goodreads)—collaborate via Agent Context Protocols (ACPs) to decompose and execute complex tasks.