CoAct: A Global-Local Hierarchy for Autonomous Agent Collaboration
Xinming Hou, Mingming Yang, Wenxiang Jiao, Xing Wang, Zhaopeng Tu, Wayne Xin Zhao
TL;DR
CoAct introduces a two-agent hierarchical framework for autonomous LLM collaboration, combining a global planning agent with a local execution agent to tackle long-horizon, real-world tasks. The global planner crafts macro-level phase plans and subtask descriptions, while the local executor implements subtasks and provides execution feedback to trigger replanning. Evaluated on the WebArena benchmark, CoAct substantially outperforms ReAct, with improvements up to ~70% SR when using force-stop interventions, and analyses identify planning and memory-related bottlenecks as opportunities for enhancement. The work demonstrates that explicit global-local task decomposition and adaptive re-planning enable more robust autonomous web-navigation tasks and suggest useful directions for integrating web-page knowledge and memory into planning.
Abstract
Existing LLMs exhibit remarkable performance on various NLP tasks, but still struggle with complex real-world tasks, even equipped with advanced strategies like CoT and ReAct. In this work, we propose the CoAct framework, which transfers the hierarchical planning and collaboration patterns in human society to LLM systems. Specifically, our CoAct framework involves two agents: (1) A global planning agent, to comprehend the problem scope, formulate macro-level plans and provide detailed sub-task descriptions to local execution agents, which serves as the initial rendition of a global plan. (2) A local execution agent, to operate within the multi-tier task execution structure, focusing on detailed execution and implementation of specific tasks within the global plan. Experimental results on the WebArena benchmark show that CoAct can re-arrange the process trajectory when facing failures, and achieves superior performance over baseline methods on long-horizon web tasks. Code is available at https://github.com/xmhou2002/CoAct.
