Table of Contents
Fetching ...

ToolPlanner: A Tool Augmented LLM for Multi Granularity Instructions with Path Planning and Feedback

Qinzhuo Wu, Wei Liu, Jian Luan, Bin Wang

TL;DR

This work constructed a training dataset called MGToolBench, which contains statement and category-level instructions to better reflect real-world scenarios and proposes ToolPlanner, a two-stage reinforcement learning framework that utilizes path planning and two feedback mechanisms to enhance the LLM's task completion and instruction-following capabilities.

Abstract

Recently, tool-augmented LLMs have gained increasing attention. Given an instruction, tool-augmented LLMs can interact with various external tools in multiple rounds and provide a final answer. However, previous LLMs were trained on overly detailed instructions, which included API names or parameters, while real users would not explicitly mention these API details. This leads to a gap between trained LLMs and real-world scenarios. In addition, most works ignore whether the interaction process follows the instruction. To address these issues, we constructed a training dataset called MGToolBench, which contains statement and category-level instructions to better reflect real-world scenarios. In addition, we propose ToolPlanner, a two-stage reinforcement learning framework that utilizes path planning and two feedback mechanisms to enhance the LLM's task completion and instruction-following capabilities. Experimental results show that ToolPlanner significantly improves the Match Rate, Pass Rate and Win Rate by 26.8%, 20.2%, and 5.6% compared to the SOTA model. Human evaluation verifies that the multi-granularity instructions can better align with users' usage habits. Our data and code will be released upon acceptance.

ToolPlanner: A Tool Augmented LLM for Multi Granularity Instructions with Path Planning and Feedback

TL;DR

This work constructed a training dataset called MGToolBench, which contains statement and category-level instructions to better reflect real-world scenarios and proposes ToolPlanner, a two-stage reinforcement learning framework that utilizes path planning and two feedback mechanisms to enhance the LLM's task completion and instruction-following capabilities.

Abstract

Recently, tool-augmented LLMs have gained increasing attention. Given an instruction, tool-augmented LLMs can interact with various external tools in multiple rounds and provide a final answer. However, previous LLMs were trained on overly detailed instructions, which included API names or parameters, while real users would not explicitly mention these API details. This leads to a gap between trained LLMs and real-world scenarios. In addition, most works ignore whether the interaction process follows the instruction. To address these issues, we constructed a training dataset called MGToolBench, which contains statement and category-level instructions to better reflect real-world scenarios. In addition, we propose ToolPlanner, a two-stage reinforcement learning framework that utilizes path planning and two feedback mechanisms to enhance the LLM's task completion and instruction-following capabilities. Experimental results show that ToolPlanner significantly improves the Match Rate, Pass Rate and Win Rate by 26.8%, 20.2%, and 5.6% compared to the SOTA model. Human evaluation verifies that the multi-granularity instructions can better align with users' usage habits. Our data and code will be released upon acceptance.
Paper Structure (46 sections, 5 equations, 9 figures, 27 tables)

This paper contains 46 sections, 5 equations, 9 figures, 27 tables.

Figures (9)

  • Figure 1: Several instructions and their granularity levels from real users, ToolBench, and MGToolBench. Real users tend to provide instructions at a higher level, such as Statement or Category, while ToolBench often consists of more detailed instructions at the API level.
  • Figure 2: Descriptions and examples of instructions at different granularity levels.
  • Figure 3: MGToolBench Dataset Pipeline.
  • Figure 4: (Top) The overview of our proposed ToolPlanner. (Bottom Left): An external tool pool with 6 candidate APIs. (Bottom Right): Results of 7 candidate solutions on our metrics.
  • Figure 5: Two solution tree and their pairwise responses for a tool-level instruction.
  • ...and 4 more figures