Table of Contents
Fetching ...

Tool Learning in the Wild: Empowering Language Models as Automatic Tool Agents

Zhengliang Shi, Shen Gao, Lingyong Yan, Yue Feng, Xiuyi Chen, Zhumin Chen, Dawei Yin, Suzan Verberne, Zhaochun Ren

TL;DR

AutoTools addresses the scalability and flexibility gaps in using external tools with LLMs by proposing a two-stage framework that automatically encapsulates tool documentation into callable functions with runtime verification and then programs executable workflows from a shared function library. To extend effectiveness to smaller LLMs, AutoTools-Learning introduces a multi-task synthetic-data approach covering tool understanding, relevance learning, and function learning, releasing 34k high-quality examples. Empirical results on RestBench, ToolBench, and the new AutoTools-Eval show AutoTools improves tool encapsulation accuracy and end-to-end tool-use performance, with AutoTools-Learning providing additional gains for open-source models. The work delivers a practical pathway for real-world tool use by LLMs and points to future directions in multi-modal, vision-enabled tool agents and broader open-source adoption.

Abstract

Augmenting large language models (LLMs) with external tools has emerged as a promising approach to extend their utility, enabling them to solve practical tasks. Previous methods manually parse tool documentation and create in-context demonstrations, transforming tools into structured formats for LLMs to use in their step-by-step reasoning. However, this manual process requires domain expertise and struggles to scale to large toolsets. Additionally, these methods rely heavily on ad-hoc inference techniques or special tokens to integrate free-form LLM generation with tool-calling actions, limiting the LLM's flexibility in handling diverse tool specifications and integrating multiple tools. In this work, we propose AutoTools, a framework that enables LLMs to automate the tool-use workflow. Specifically, the LLM automatically transforms tool documentation into callable functions, verifying syntax and runtime correctness. Then, the LLM integrates these functions into executable programs to solve practical tasks, flexibly grounding tool-use actions into its reasoning processes. Extensive experiments on existing and newly collected, more challenging benchmarks illustrate the superiority of our framework. Inspired by these promising results, we further investigate how to improve the expertise of LLMs, especially open-source LLMs with fewer parameters, within AutoTools. Thus, we propose the AutoTools-learning approach, training the LLMs with three learning tasks on 34k instances of high-quality synthetic data, including documentation understanding, relevance learning, and function programming. Fine-grained results validate the effectiveness of our overall training approach and each individual task. Our methods are an important step towards the use of LLMs for solving real-world tasks with external tools.

Tool Learning in the Wild: Empowering Language Models as Automatic Tool Agents

TL;DR

AutoTools addresses the scalability and flexibility gaps in using external tools with LLMs by proposing a two-stage framework that automatically encapsulates tool documentation into callable functions with runtime verification and then programs executable workflows from a shared function library. To extend effectiveness to smaller LLMs, AutoTools-Learning introduces a multi-task synthetic-data approach covering tool understanding, relevance learning, and function learning, releasing 34k high-quality examples. Empirical results on RestBench, ToolBench, and the new AutoTools-Eval show AutoTools improves tool encapsulation accuracy and end-to-end tool-use performance, with AutoTools-Learning providing additional gains for open-source models. The work delivers a practical pathway for real-world tool use by LLMs and points to future directions in multi-modal, vision-enabled tool agents and broader open-source adoption.

Abstract

Augmenting large language models (LLMs) with external tools has emerged as a promising approach to extend their utility, enabling them to solve practical tasks. Previous methods manually parse tool documentation and create in-context demonstrations, transforming tools into structured formats for LLMs to use in their step-by-step reasoning. However, this manual process requires domain expertise and struggles to scale to large toolsets. Additionally, these methods rely heavily on ad-hoc inference techniques or special tokens to integrate free-form LLM generation with tool-calling actions, limiting the LLM's flexibility in handling diverse tool specifications and integrating multiple tools. In this work, we propose AutoTools, a framework that enables LLMs to automate the tool-use workflow. Specifically, the LLM automatically transforms tool documentation into callable functions, verifying syntax and runtime correctness. Then, the LLM integrates these functions into executable programs to solve practical tasks, flexibly grounding tool-use actions into its reasoning processes. Extensive experiments on existing and newly collected, more challenging benchmarks illustrate the superiority of our framework. Inspired by these promising results, we further investigate how to improve the expertise of LLMs, especially open-source LLMs with fewer parameters, within AutoTools. Thus, we propose the AutoTools-learning approach, training the LLMs with three learning tasks on 34k instances of high-quality synthetic data, including documentation understanding, relevance learning, and function programming. Fine-grained results validate the effectiveness of our overall training approach and each individual task. Our methods are an important step towards the use of LLMs for solving real-world tasks with external tools.
Paper Structure (28 sections, 7 equations, 5 figures, 11 tables)

This paper contains 28 sections, 7 equations, 5 figures, 11 tables.

Figures (5)

  • Figure 1: Comparison between conventional tool-use flow (a) and the proposed framework (b).
  • Figure 2: An overview of the proposed framework AutoTools, in which the LLM (1) automatically encapsulates diverse tools into unified callable functions and (2) directly utilizes these functions through programming.
  • Figure 3: Details for our integration verification (Section \ref{['sec:verify']}).
  • Figure 4: The step (turn) level performance evaluation.
  • Figure 5: Average consumed tokens along with performance (success rate) for different methods.