Learning Evolving Tools for Large Language Models

Guoxin Chen; Zhong Zhang; Xin Cong; Fangda Guo; Yesai Wu; Yankai Lin; Wenzheng Feng; Yasheng Wang

Learning Evolving Tools for Large Language Models

Guoxin Chen, Zhong Zhang, Xin Cong, Fangda Guo, Yesai Wu, Yankai Lin, Wenzheng Feng, Yasheng Wang

TL;DR

ToolEVO introduces an MCTS-based framework to address tool variability in large language model tool learning by pairing active environment interaction with self-reflection and autonomous tool updates. It enables LLMs to explore dynamic APIs, reflect on invocation and deprecation errors, and update tool usage descriptions in prompts, improving robustness in static, dynamic, and out-of-distribution settings. A new benchmark, ToolQA-D, simulates API evolution by perturbing collected tool usage into in-distribution and OOD deployments to evaluate adaptability. Across experiments with open- and closed-model baselines, ToolEVO demonstrates enhanced performance stability and superior generalization when facing tool variability, underscoring the importance of training in dynamic environments for effective tool learning. The work also provides a detailed ablation study and analysis of API name, parameter, and response-format changes, reinforcing the value of self-reflection and tool-update components for real-world applicability.

Abstract

Tool learning enables large language models (LLMs) to interact with external tools and APIs, greatly expanding the application scope of LLMs. However, due to the dynamic nature of external environments, these tools and APIs may become outdated over time, preventing LLMs from correctly invoking tools. Existing research primarily focuses on static environments and overlooks this issue, limiting the adaptability of LLMs in real-world applications. In this paper, we propose ToolEVO, a novel framework designed to enhance the adaptive and reflective capabilities of LLMs against tool variability. By leveraging Monte Carlo Tree Search, ToolEVO facilitates active exploration and interaction of LLMs within dynamic environments, allowing for autonomous self-reflection and self-updating of tool usage based on environmental feedback. Additionally, we introduce ToolQA-D, a benchmark specifically designed to evaluate the impact of tool variability. Extensive experiments demonstrate the effectiveness and stability of our approach, highlighting the importance of adaptability to tool variability for effective tool learning. Code: https://github.com/Chen-GX/ToolEVO

Learning Evolving Tools for Large Language Models

TL;DR

Abstract

Learning Evolving Tools for Large Language Models

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)