Table of Contents
Fetching ...

Learning to Use Tools via Cooperative and Interactive Agents

Zhengliang Shi, Shen Gao, Xiuyi Chen, Yue Feng, Lingyong Yan, Haibo Shi, Dawei Yin, Pengjie Ren, Suzan Verberne, Zhaochun Ren

TL;DR

The paper addresses the fragility of single-agent tool-use in LLMs by introducing ConAgents, a cooperative framework that splits tool-learning into three specialized agents—Grounding, Execution, and Review—accompanied by automatic and adaptive communication protocols. It further offers SPAN, a specialized action distillation approach, to empower open-source LLMs by distilling task-solving trajectories into dedicated agents. Extensive experiments on ToolBench and RestBench demonstrate that ConAgents consistently outperform baselines, with open-source variants achieving strong performance when enhanced by SPAN. The work advances practical tool use by enabling dynamic error calibration, modular specialization, and accessible deployment on open-source models, with implications for robust, real-world AI agent systems.

Abstract

Tool learning empowers large language models (LLMs) as agents to use external tools and extend their utility. Existing methods employ one single LLM-based agent to iteratively select and execute tools, thereafter incorporating execution results into the next action prediction. Despite their progress, these methods suffer from performance degradation when addressing practical tasks due to: (1) the pre-defined pipeline with restricted flexibility to calibrate incorrect actions, and (2) the struggle to adapt a general LLM-based agent to perform a variety of specialized actions. To mitigate these problems, we propose ConAgents, a Cooperative and interactive Agents framework, which coordinates three specialized agents for tool selection, tool execution, and action calibration separately. ConAgents introduces two communication protocols to enable the flexible cooperation of agents. To effectively generalize the ConAgents into open-source models, we also propose specialized action distillation, enhancing their ability to perform specialized actions in our framework. Our extensive experiments on three datasets show that the LLMs, when equipped with the ConAgents, outperform baselines with substantial improvement (i.e., up to 14% higher success rate).

Learning to Use Tools via Cooperative and Interactive Agents

TL;DR

The paper addresses the fragility of single-agent tool-use in LLMs by introducing ConAgents, a cooperative framework that splits tool-learning into three specialized agents—Grounding, Execution, and Review—accompanied by automatic and adaptive communication protocols. It further offers SPAN, a specialized action distillation approach, to empower open-source LLMs by distilling task-solving trajectories into dedicated agents. Extensive experiments on ToolBench and RestBench demonstrate that ConAgents consistently outperform baselines, with open-source variants achieving strong performance when enhanced by SPAN. The work advances practical tool use by enabling dynamic error calibration, modular specialization, and accessible deployment on open-source models, with implications for robust, real-world AI agent systems.

Abstract

Tool learning empowers large language models (LLMs) as agents to use external tools and extend their utility. Existing methods employ one single LLM-based agent to iteratively select and execute tools, thereafter incorporating execution results into the next action prediction. Despite their progress, these methods suffer from performance degradation when addressing practical tasks due to: (1) the pre-defined pipeline with restricted flexibility to calibrate incorrect actions, and (2) the struggle to adapt a general LLM-based agent to perform a variety of specialized actions. To mitigate these problems, we propose ConAgents, a Cooperative and interactive Agents framework, which coordinates three specialized agents for tool selection, tool execution, and action calibration separately. ConAgents introduces two communication protocols to enable the flexible cooperation of agents. To effectively generalize the ConAgents into open-source models, we also propose specialized action distillation, enhancing their ability to perform specialized actions in our framework. Our extensive experiments on three datasets show that the LLMs, when equipped with the ConAgents, outperform baselines with substantial improvement (i.e., up to 14% higher success rate).
Paper Structure (49 sections, 9 equations, 6 figures, 5 tables)

This paper contains 49 sections, 9 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Comparison between (a) existing single-agent tool learning method and (b) our cooperative agent framework ConAgents. The ConAgents coordinates three agents through two proposed communication protocols, e.g., automatic and adaptive interaction.
  • Figure 2: Our proposed cooperative and interactive agent framework. The left shows the three specialized agents in our framework (§ \ref{['sec:framework']}). The right illustrates two proposed communication protocols to coordinate these specialized agents, including the automatic and adaptive communication (§ \ref{['sec:protocol']}).
  • Figure 3: The qualitative analysis for the maximum interaction turns $\alpha$ and $\beta$ in our agent communication protocols (Section \ref{['sec:protocol']}) on the TMDB dataset.
  • Figure 4: The efficiency analysis for different methods, where we count the average consumed tokens.
  • Figure 5: An example to illustrate the proposed automatic agent communication of our framework ConAgents. For each turn, the communication starts with the planning-and-review between the grounding agent and review agent. Following the planning , the execution agent generates programs to execute tools and calibrates the incorrect result with the review of review agent. In this figure, we highlight the useful review of review agent with red.
  • ...and 1 more figures