Tool Learning in the Wild: Empowering Language Models as Automatic Tool Agents
Zhengliang Shi, Shen Gao, Lingyong Yan, Yue Feng, Xiuyi Chen, Zhumin Chen, Dawei Yin, Suzan Verberne, Zhaochun Ren
TL;DR
AutoTools addresses the scalability and flexibility gaps in using external tools with LLMs by proposing a two-stage framework that automatically encapsulates tool documentation into callable functions with runtime verification and then programs executable workflows from a shared function library. To extend effectiveness to smaller LLMs, AutoTools-Learning introduces a multi-task synthetic-data approach covering tool understanding, relevance learning, and function learning, releasing 34k high-quality examples. Empirical results on RestBench, ToolBench, and the new AutoTools-Eval show AutoTools improves tool encapsulation accuracy and end-to-end tool-use performance, with AutoTools-Learning providing additional gains for open-source models. The work delivers a practical pathway for real-world tool use by LLMs and points to future directions in multi-modal, vision-enabled tool agents and broader open-source adoption.
Abstract
Augmenting large language models (LLMs) with external tools has emerged as a promising approach to extend their utility, enabling them to solve practical tasks. Previous methods manually parse tool documentation and create in-context demonstrations, transforming tools into structured formats for LLMs to use in their step-by-step reasoning. However, this manual process requires domain expertise and struggles to scale to large toolsets. Additionally, these methods rely heavily on ad-hoc inference techniques or special tokens to integrate free-form LLM generation with tool-calling actions, limiting the LLM's flexibility in handling diverse tool specifications and integrating multiple tools. In this work, we propose AutoTools, a framework that enables LLMs to automate the tool-use workflow. Specifically, the LLM automatically transforms tool documentation into callable functions, verifying syntax and runtime correctness. Then, the LLM integrates these functions into executable programs to solve practical tasks, flexibly grounding tool-use actions into its reasoning processes. Extensive experiments on existing and newly collected, more challenging benchmarks illustrate the superiority of our framework. Inspired by these promising results, we further investigate how to improve the expertise of LLMs, especially open-source LLMs with fewer parameters, within AutoTools. Thus, we propose the AutoTools-learning approach, training the LLMs with three learning tasks on 34k instances of high-quality synthetic data, including documentation understanding, relevance learning, and function programming. Fine-grained results validate the effectiveness of our overall training approach and each individual task. Our methods are an important step towards the use of LLMs for solving real-world tasks with external tools.
