Table of Contents
Fetching ...

Learning Evolving Tools for Large Language Models

Guoxin Chen, Zhong Zhang, Xin Cong, Fangda Guo, Yesai Wu, Yankai Lin, Wenzheng Feng, Yasheng Wang

TL;DR

ToolEVO introduces an MCTS-based framework to address tool variability in large language model tool learning by pairing active environment interaction with self-reflection and autonomous tool updates. It enables LLMs to explore dynamic APIs, reflect on invocation and deprecation errors, and update tool usage descriptions in prompts, improving robustness in static, dynamic, and out-of-distribution settings. A new benchmark, ToolQA-D, simulates API evolution by perturbing collected tool usage into in-distribution and OOD deployments to evaluate adaptability. Across experiments with open- and closed-model baselines, ToolEVO demonstrates enhanced performance stability and superior generalization when facing tool variability, underscoring the importance of training in dynamic environments for effective tool learning. The work also provides a detailed ablation study and analysis of API name, parameter, and response-format changes, reinforcing the value of self-reflection and tool-update components for real-world applicability.

Abstract

Tool learning enables large language models (LLMs) to interact with external tools and APIs, greatly expanding the application scope of LLMs. However, due to the dynamic nature of external environments, these tools and APIs may become outdated over time, preventing LLMs from correctly invoking tools. Existing research primarily focuses on static environments and overlooks this issue, limiting the adaptability of LLMs in real-world applications. In this paper, we propose ToolEVO, a novel framework designed to enhance the adaptive and reflective capabilities of LLMs against tool variability. By leveraging Monte Carlo Tree Search, ToolEVO facilitates active exploration and interaction of LLMs within dynamic environments, allowing for autonomous self-reflection and self-updating of tool usage based on environmental feedback. Additionally, we introduce ToolQA-D, a benchmark specifically designed to evaluate the impact of tool variability. Extensive experiments demonstrate the effectiveness and stability of our approach, highlighting the importance of adaptability to tool variability for effective tool learning. Code: https://github.com/Chen-GX/ToolEVO

Learning Evolving Tools for Large Language Models

TL;DR

ToolEVO introduces an MCTS-based framework to address tool variability in large language model tool learning by pairing active environment interaction with self-reflection and autonomous tool updates. It enables LLMs to explore dynamic APIs, reflect on invocation and deprecation errors, and update tool usage descriptions in prompts, improving robustness in static, dynamic, and out-of-distribution settings. A new benchmark, ToolQA-D, simulates API evolution by perturbing collected tool usage into in-distribution and OOD deployments to evaluate adaptability. Across experiments with open- and closed-model baselines, ToolEVO demonstrates enhanced performance stability and superior generalization when facing tool variability, underscoring the importance of training in dynamic environments for effective tool learning. The work also provides a detailed ablation study and analysis of API name, parameter, and response-format changes, reinforcing the value of self-reflection and tool-update components for real-world applicability.

Abstract

Tool learning enables large language models (LLMs) to interact with external tools and APIs, greatly expanding the application scope of LLMs. However, due to the dynamic nature of external environments, these tools and APIs may become outdated over time, preventing LLMs from correctly invoking tools. Existing research primarily focuses on static environments and overlooks this issue, limiting the adaptability of LLMs in real-world applications. In this paper, we propose ToolEVO, a novel framework designed to enhance the adaptive and reflective capabilities of LLMs against tool variability. By leveraging Monte Carlo Tree Search, ToolEVO facilitates active exploration and interaction of LLMs within dynamic environments, allowing for autonomous self-reflection and self-updating of tool usage based on environmental feedback. Additionally, we introduce ToolQA-D, a benchmark specifically designed to evaluate the impact of tool variability. Extensive experiments demonstrate the effectiveness and stability of our approach, highlighting the importance of adaptability to tool variability for effective tool learning. Code: https://github.com/Chen-GX/ToolEVO

Paper Structure

This paper contains 53 sections, 5 equations, 7 figures, 5 tables, 1 algorithm.

Figures (7)

  • Figure 1: (Left) An example of inconsistent usage (name, parameters, or response formats) between the collected APIs available to LLMs and the latest APIs deployed on the server. The collected APIs may become outdated over time. (Right) An overview of our ToolEVO. The LLM engages with the dynamic environment using MCTS for fine-tuning against tool variability, reflecting and updating tool usage based on environmental feedback. Each node in MCTS contains an API invocation.
  • Figure 2: Impact of tool variability. "Consistent APIs" refer to APIs that are consistent between LLMs and servers. "Changed APIs" refer to APIs accessible to LLMs that are outdated over time. "Static-SFT" is supervised fine-tuning on tool usage data that has no adaptability to tool variability. Our method successfully adapts to API changes.
  • Figure 3: Examples of self-reflection and tool update. Invocation errors indicate that the input parameters of the API need to be corrected. In contrast, deprecation errors suggest that the input parameters are correct, but the API has been deprecated, necessitating an update in API usage.
  • Figure 4: Analysis on API name.
  • Figure 5: Analysis on API parameters.
  • ...and 2 more figures