FunReason-MT Technical Report: Advanced Data Synthesis Solution for Real-world Multi-Turn Tool-use
Zengzhuang Xu, Bingguang Hao, Zechuan Wang, Yuntao Wen, Xinyi Xu, Yang Liu, Long Chen, Dong Wang, Maolin Wang, Tong Zhao, Yicheng Chen, Cunyin Peng, Jinjie Gu, Leilei Gan, Xiangyu Zhao, Chenyi Zhuang, Shi Gu
TL;DR
FunReason-MT tackles the data bottleneck for real-world multi-turn function calling by combining an Environment-API Graph–driven data collection, reverse hard-query synthesis, and a guided CoT refinement loop. The three-phase framework generates targeted, challenging, and coherent trajectories that surpass random sampling and MAS role-playing in diversity and difficulty. Empirical results on BFCLv3 show a 4B-size model achieving state-of-the-art performance among similar models, with further gains on BFCLv4 demonstrating agentic generalization. The work provides a robust data synthesis approach and training recipe that support reliable environment-enabled learning and improved real-world tool use.
Abstract
Function calling (FC) empowers large language models (LLMs) and autonomous agents to interface with external tools, a critical capability for solving complex, real-world problems. As this ability becomes increasingly central to advanced AI systems, the need for high-quality, multi-turn training data to develop and refine it cannot be overstated. Existing data synthesis methods, such as random environment sampling or multi-agent role-playing, are not powerful enough to generate high-quality data in real-world environments. Practical challenges come in three folds: targeted data synthesis, hard query construction, and multi-turn logical dependency. To address these structural deficiencies, we present FunReason-MT, a novel data synthesis framework for real-world multi-turn tool use. FunReason-MT resolves the complexity barrier in multi-turn FC data by employing 1) Environment-API Graph Interactions to gather varied high-quality trajectories with targeted tool, 2) Advanced Tool-Query Synthesis to simplify hard query construction, and 3) Guided Iterative Chain for sophisticated CoT generation. Evaluations on Berkeley Function-Calling Leaderboard (BFCLv3) demonstrate the power of our framework: a 4B model built upon FunReason-MT generated data achieves state-of-the-art performance among comparable-sized models. Further performance improvements on BFCLv4 confirm that FunReason-MT provides a reliable and robust source for agentic learning.
