Table of Contents
Fetching ...

MetaTool: Facilitating Large Language Models to Master Tools with Meta-task Augmentation

Xiaohan Wang, Dian Li, Yilin Zhao, Sinbadliu, Hui Wang

TL;DR

Through large-scale instruction tuning, the MetaTool model demonstrates impressive zero-shot generalizability on new tasks, achieving results comparable to ChatGPT in both tool-based planning and chatting scenarios.

Abstract

Utilizing tools with Large Language Models (LLMs) is essential for grounding AI agents in real-world applications. The prevailing approach involves few-shot prompting with demonstrations or fine-tuning with expert annotations. However, mere in-context demonstrations may fail to cover sufficient knowledge for complex tools and tasks. Training on solution paths is also hindered by the high cost of expert annotations and generalizing to new tools. A core challenge of generalizable tool use lies in understanding the "meta", or fundamental natures of tools that are transferable across tasks, such as causality and constraints. In this paper, we present MetaTool, a novel tool learning methodology designed to generalize across any reusable toolset. Our approach incorporates a self-supervised augmentation technique derived from a series of meta-tasks. This involves predicting masked elements in the tool execution process. The self-supervised procedure enables scalable generation of high-quality QA data, which is handy for supervising tool understanding. By incorporating meta-task data into task-oriented training, our method significantly enhances the performance of open-source LLMs, achieving results comparable to ChatGPT in both tool-based planning and chatting scenarios. Through large-scale instruction tuning, the MetaTool model demonstrates impressive zero-shot generalizability on new tasks.

MetaTool: Facilitating Large Language Models to Master Tools with Meta-task Augmentation

TL;DR

Through large-scale instruction tuning, the MetaTool model demonstrates impressive zero-shot generalizability on new tasks, achieving results comparable to ChatGPT in both tool-based planning and chatting scenarios.

Abstract

Utilizing tools with Large Language Models (LLMs) is essential for grounding AI agents in real-world applications. The prevailing approach involves few-shot prompting with demonstrations or fine-tuning with expert annotations. However, mere in-context demonstrations may fail to cover sufficient knowledge for complex tools and tasks. Training on solution paths is also hindered by the high cost of expert annotations and generalizing to new tools. A core challenge of generalizable tool use lies in understanding the "meta", or fundamental natures of tools that are transferable across tasks, such as causality and constraints. In this paper, we present MetaTool, a novel tool learning methodology designed to generalize across any reusable toolset. Our approach incorporates a self-supervised augmentation technique derived from a series of meta-tasks. This involves predicting masked elements in the tool execution process. The self-supervised procedure enables scalable generation of high-quality QA data, which is handy for supervising tool understanding. By incorporating meta-task data into task-oriented training, our method significantly enhances the performance of open-source LLMs, achieving results comparable to ChatGPT in both tool-based planning and chatting scenarios. Through large-scale instruction tuning, the MetaTool model demonstrates impressive zero-shot generalizability on new tasks.
Paper Structure (18 sections, 1 equation, 4 figures, 4 tables)

This paper contains 18 sections, 1 equation, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Paradigm comparison between existing tool learning methods and proposed meta-task augmentation. While the prevailing methods are limited in generalizing to complex scenarios or new tools, MetaTool enables gaining transferable tool understanding from task-agnostic knowledge.
  • Figure 2: Illustration of developing self-supervised meta-tasks from unsupervised tool execution process.
  • Figure 3: Two-step approach to construct metaset. We illustrate two examplified processes of both tool-oriented and tool-augmented scenarios, which don't require any expert annotation. $x_i^D, y_i^D$ denotes the $i$-th question-answer pair of decision-making meta-task, et cetera.
  • Figure 4: Case study of MetaTool compared with 2 baselines on BlocksWorld task. Actions in red denote invalid ones (e.g. pick up a block at the bottom). LLaMA3-solution is the LLaMA model trained on task solution data.