Table of Contents
Fetching ...

Magnet: Multi-turn Tool-use Data Synthesis and Distillation via Graph Translation

Fan Yin, Zifeng Wang, I-Hung Hsu, Jun Yan, Ke Jiang, Yanfei Chen, Jindong Gu, Long T. Le, Kai-Wei Chang, Chen-Yu Lee, Hamid Palangi, Tomas Pfister

TL;DR

Magnet tackles the challenge of robustly training language model agents for multi-turn tool use by synthesizing high-quality training trajectories through graph-based function execution paths and hint-driven distillation. It couples a back-and-forth translation backbone with Insert, Merge, and Split node operations to generate diverse, nested, and context-dependent function call sequences, while employing positive and negative trajectory signals from a teacher model to steer learning via supervised fine-tuning and preference optimization. The approach demonstrates strong gains on BFCL-v3 and ToolQuery with Magnet-14B-mDPO, surpassing several baselines and the teacher model, and shows the benefits of graph-aware trajectory construction and data mixture. Overall, Magnet offers a scalable, generalizable framework to improve multi-turn FC performance across models and domains, with potential extensions to multilingual and multimodal tool ecosystems.

Abstract

Large language models (LLMs) have exhibited the ability to effectively utilize external tools to address user queries. However, their performance may be limited in complex, multi-turn interactions involving users and multiple tools. To address this, we propose Magnet, a principled framework for synthesizing high-quality training trajectories to enhance the function calling capability of large language model agents in multi-turn conversations with humans. The framework is based on automatic and iterative translations from a function signature path to a sequence of queries and executable function calls. We model the complicated function interactions in multi-turn cases with graph and design novel node operations to build reliable signature paths. Motivated by context distillation, when guiding the generation of positive and negative trajectories using a teacher model, we provide reference function call sequences as positive hints in context and contrastive, incorrect function calls as negative hints. Experiments show that training with the positive trajectories with supervised fine-tuning and preference optimization against negative trajectories, our 14B model, Magnet-14B-mDPO, obtains 68.01 on BFCL-v3 and 73.30 on ToolQuery, surpassing the performance of the teacher model Gemini-1.5-pro-002 by a large margin in function calling.

Magnet: Multi-turn Tool-use Data Synthesis and Distillation via Graph Translation

TL;DR

Magnet tackles the challenge of robustly training language model agents for multi-turn tool use by synthesizing high-quality training trajectories through graph-based function execution paths and hint-driven distillation. It couples a back-and-forth translation backbone with Insert, Merge, and Split node operations to generate diverse, nested, and context-dependent function call sequences, while employing positive and negative trajectory signals from a teacher model to steer learning via supervised fine-tuning and preference optimization. The approach demonstrates strong gains on BFCL-v3 and ToolQuery with Magnet-14B-mDPO, surpassing several baselines and the teacher model, and shows the benefits of graph-aware trajectory construction and data mixture. Overall, Magnet offers a scalable, generalizable framework to improve multi-turn FC performance across models and domains, with potential extensions to multilingual and multimodal tool ecosystems.

Abstract

Large language models (LLMs) have exhibited the ability to effectively utilize external tools to address user queries. However, their performance may be limited in complex, multi-turn interactions involving users and multiple tools. To address this, we propose Magnet, a principled framework for synthesizing high-quality training trajectories to enhance the function calling capability of large language model agents in multi-turn conversations with humans. The framework is based on automatic and iterative translations from a function signature path to a sequence of queries and executable function calls. We model the complicated function interactions in multi-turn cases with graph and design novel node operations to build reliable signature paths. Motivated by context distillation, when guiding the generation of positive and negative trajectories using a teacher model, we provide reference function call sequences as positive hints in context and contrastive, incorrect function calls as negative hints. Experiments show that training with the positive trajectories with supervised fine-tuning and preference optimization against negative trajectories, our 14B model, Magnet-14B-mDPO, obtains 68.01 on BFCL-v3 and 73.30 on ToolQuery, surpassing the performance of the teacher model Gemini-1.5-pro-002 by a large margin in function calling.

Paper Structure

This paper contains 21 sections, 3 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Illustration of challenges and common mistakes in multi-turn FC. An agent needs to understand function outputs and finish follow-up queries from users. This brings several challenges to the agent such as nested FCs (turn 1), long output dependencies (turn 4), irrelevant functions (turn 5).
  • Figure 2: The pipeline for constructing trajectories of function calling. We divide the pipeline into four parts and depicts each part respectively. (1) Construction of the function pool and function execution graph; (2) Node operations defined on the function execution graph; (3) Back-and-forth translation to iteratively create multi-turn queries and fill in function parameters; (4) Construction of positive and negative trajectories by context distillation of good and bad hints and instructions.
  • Figure 3: The performance when changing the data mixture with different number of irrelevance data.