Table of Contents
Fetching ...

AgentCollab: A Self-Evaluation-Driven Collaboration Paradigm for Efficient LLM Agents

Wenbo Gao, Renxi Liu, Xian Wang, Fang Guo, Shuai Yang, Xi Chen, Hui-Ling Zhen, Hanting Chen, Weizhe Lin, Xiaosong Li, Yaoyuan Wang

Abstract

Autonomous agents powered by large language models (LLMs) perform complex tasks through long-horizon reasoning and tool interaction, where a fundamental trade-off arises between execution efficiency and reasoning robustness. Models at different capability-cost levels offer complementary advantages: lower-cost models enable fast execution but may struggle on difficult reasoning segments, while stronger models provide more robust reasoning at higher computational cost. We present AgentCollab, a self-driven collaborative inference framework that dynamically coordinates models with different reasoning capacities during agent execution. Instead of relying on external routing modules, the framework uses the agent's own self-reflection signal to determine whether the current reasoning trajectory is making meaningful progress, and escalates control to a stronger reasoning tier only when necessary. To further stabilize long-horizon execution, we introduce a difficulty-aware cumulative escalation strategy that allocates additional reasoning budget based on recent failure signals. In our experiments, we instantiate this framework using a two-level small-large model setting. Experiments on diverse multi-step agent benchmarks show that AgentCollab consistently improves the accuracy-efficiency Pareto frontier of LLM agents.

AgentCollab: A Self-Evaluation-Driven Collaboration Paradigm for Efficient LLM Agents

Abstract

Autonomous agents powered by large language models (LLMs) perform complex tasks through long-horizon reasoning and tool interaction, where a fundamental trade-off arises between execution efficiency and reasoning robustness. Models at different capability-cost levels offer complementary advantages: lower-cost models enable fast execution but may struggle on difficult reasoning segments, while stronger models provide more robust reasoning at higher computational cost. We present AgentCollab, a self-driven collaborative inference framework that dynamically coordinates models with different reasoning capacities during agent execution. Instead of relying on external routing modules, the framework uses the agent's own self-reflection signal to determine whether the current reasoning trajectory is making meaningful progress, and escalates control to a stronger reasoning tier only when necessary. To further stabilize long-horizon execution, we introduce a difficulty-aware cumulative escalation strategy that allocates additional reasoning budget based on recent failure signals. In our experiments, we instantiate this framework using a two-level small-large model setting. Experiments on diverse multi-step agent benchmarks show that AgentCollab consistently improves the accuracy-efficiency Pareto frontier of LLM agents.

Paper Structure

This paper contains 16 sections, 5 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Overview of the proposed AgentCollab framework. (a) The agent system first invokes the large model to warm up the interaction and plan the overall strategy. (b) After initialization, the small model performs a low-latency routine reasoning steps and (c) conducts self-evaluation on progress checks to determine whether escalation is required. (d) When stagnation is detected, control is temporarily transferred to a larger model within a predefined budget to resolve difficult reasoning segments. (e) Once the critical step is completed, control returns to the small model to continue the interaction.
  • Figure 2: Budget Allocation Strategy. As consecutive failures accumulate, the framework temporarily increases the budget for the larger model, allowing stronger models to provide additional support.
  • Figure 3: Pareto frontier of DDV2 on BrowseComp_zh.
  • Figure 4: Illustrative execution trajectories. The first row shows a representative case, and the second row shows an extreme case. Panels (a) and (c) plot cumulative latency bars and instantaneous TPS curves over reasoning steps, with speedup ratios annotated relative to the Large baseline. Panels (b) and (d) show the corresponding cumulative latency staircase plots.
  • Figure 5: Pareto frontier of AgentCollab within WebSailor on BrowseComp_zh under different values of $k$.
  • ...and 2 more figures