Table of Contents
Fetching ...

Advancing Tool-Augmented Large Language Models: Integrating Insights from Errors in Inference Trees

Sijia Chen, Yibo Wang, Yi-Feng Wu, Qing-Guo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang, Lijun Zhang

TL;DR

This work addresses the underutilization of failed explorations in tool-augmented LLMs by introducing an inference-trajectory optimization framework. It constructs a step-wise ToolPreference dataset from DFSDT trajectories and applies Direct Preference Optimization after a standard SFT phase to align the model's tool-usage decisions with human preferences. Empirically, the approach, applied to ToolBench data, yields substantial gains in pass and win rates, improves generalization to unseen APIs, and increases reasoning efficiency, producing a practical boost for multi-step tool usage. The method is shown to be model-agnostic across several backbones and offers a flexible path for future enhancements in tool-based reasoning tasks.

Abstract

Tool-augmented large language models (LLMs) leverage tools, often in the form of APIs, to improve their reasoning capabilities on complex tasks. This enables them to act as intelligent agents interacting with the real world. The recently introduced ToolLLaMA model by Qin et al. [2023] utilizes the depth-first search-based decision tree (DFSDT) mechanism for multi-step reasoning with $16000+$ real-world APIs, effectively enhancing the performance of tool-augmented LLMs compared to traditional chain reasoning mechanisms. However, their approach only employs successful paths from decision trees (also called inference trees) for supervised fine-tuning (SFT), missing out on the potential learning opportunities from failed paths. Inspired by this, we propose an inference trajectory optimization framework based on preference learning to address this limitation. We first introduce a novel method for constructing step-wise preference data from tree-like expert trajectories, which leverages the previously ignored failed explorations in the decision trees. In the subsequent training phase, we first fine-tune the LLM with successful tool-usage expert trajectories and then apply direct preference optimization (DPO) with the preference data to update the LLM's policy, resulting in our ToolPrefer-LLaMA (TP-LLaMA) model. This approach not only enhances the utilization of original expert data but also broadens the learning space of the model. Our experiments demonstrate that by obtaining insights from errors in inference trees, TP-LLaMA significantly outperforms the baselines across almost all test scenarios by a large margin and exhibits better generalization capabilities with unseen APIs. At the same time, TP-LLaMA has also demonstrated superior reasoning efficiency compared to the baselines, making it more suitable for complex tool-usage reasoning tasks.

Advancing Tool-Augmented Large Language Models: Integrating Insights from Errors in Inference Trees

TL;DR

This work addresses the underutilization of failed explorations in tool-augmented LLMs by introducing an inference-trajectory optimization framework. It constructs a step-wise ToolPreference dataset from DFSDT trajectories and applies Direct Preference Optimization after a standard SFT phase to align the model's tool-usage decisions with human preferences. Empirically, the approach, applied to ToolBench data, yields substantial gains in pass and win rates, improves generalization to unseen APIs, and increases reasoning efficiency, producing a practical boost for multi-step tool usage. The method is shown to be model-agnostic across several backbones and offers a flexible path for future enhancements in tool-based reasoning tasks.

Abstract

Tool-augmented large language models (LLMs) leverage tools, often in the form of APIs, to improve their reasoning capabilities on complex tasks. This enables them to act as intelligent agents interacting with the real world. The recently introduced ToolLLaMA model by Qin et al. [2023] utilizes the depth-first search-based decision tree (DFSDT) mechanism for multi-step reasoning with real-world APIs, effectively enhancing the performance of tool-augmented LLMs compared to traditional chain reasoning mechanisms. However, their approach only employs successful paths from decision trees (also called inference trees) for supervised fine-tuning (SFT), missing out on the potential learning opportunities from failed paths. Inspired by this, we propose an inference trajectory optimization framework based on preference learning to address this limitation. We first introduce a novel method for constructing step-wise preference data from tree-like expert trajectories, which leverages the previously ignored failed explorations in the decision trees. In the subsequent training phase, we first fine-tune the LLM with successful tool-usage expert trajectories and then apply direct preference optimization (DPO) with the preference data to update the LLM's policy, resulting in our ToolPrefer-LLaMA (TP-LLaMA) model. This approach not only enhances the utilization of original expert data but also broadens the learning space of the model. Our experiments demonstrate that by obtaining insights from errors in inference trees, TP-LLaMA significantly outperforms the baselines across almost all test scenarios by a large margin and exhibits better generalization capabilities with unseen APIs. At the same time, TP-LLaMA has also demonstrated superior reasoning efficiency compared to the baselines, making it more suitable for complex tool-usage reasoning tasks.
Paper Structure (30 sections, 8 equations, 2 figures, 5 tables)

This paper contains 30 sections, 8 equations, 2 figures, 5 tables.

Figures (2)

  • Figure 1: Our Inference Trajectory Optimization Framework.
  • Figure 2: Depth-first search-based decision tree and two preference data construction methods

Theorems & Definitions (2)

  • Remark 1
  • Remark 2