Learning to Use Tools via Cooperative and Interactive Agents
Zhengliang Shi, Shen Gao, Xiuyi Chen, Yue Feng, Lingyong Yan, Haibo Shi, Dawei Yin, Pengjie Ren, Suzan Verberne, Zhaochun Ren
TL;DR
The paper addresses the fragility of single-agent tool-use in LLMs by introducing ConAgents, a cooperative framework that splits tool-learning into three specialized agents—Grounding, Execution, and Review—accompanied by automatic and adaptive communication protocols. It further offers SPAN, a specialized action distillation approach, to empower open-source LLMs by distilling task-solving trajectories into dedicated agents. Extensive experiments on ToolBench and RestBench demonstrate that ConAgents consistently outperform baselines, with open-source variants achieving strong performance when enhanced by SPAN. The work advances practical tool use by enabling dynamic error calibration, modular specialization, and accessible deployment on open-source models, with implications for robust, real-world AI agent systems.
Abstract
Tool learning empowers large language models (LLMs) as agents to use external tools and extend their utility. Existing methods employ one single LLM-based agent to iteratively select and execute tools, thereafter incorporating execution results into the next action prediction. Despite their progress, these methods suffer from performance degradation when addressing practical tasks due to: (1) the pre-defined pipeline with restricted flexibility to calibrate incorrect actions, and (2) the struggle to adapt a general LLM-based agent to perform a variety of specialized actions. To mitigate these problems, we propose ConAgents, a Cooperative and interactive Agents framework, which coordinates three specialized agents for tool selection, tool execution, and action calibration separately. ConAgents introduces two communication protocols to enable the flexible cooperation of agents. To effectively generalize the ConAgents into open-source models, we also propose specialized action distillation, enhancing their ability to perform specialized actions in our framework. Our extensive experiments on three datasets show that the LLMs, when equipped with the ConAgents, outperform baselines with substantial improvement (i.e., up to 14% higher success rate).
