Facilitating Multi-turn Function Calling for LLMs via Compositional Instruction Tuning
Mingyang Chen, Haoze Sun, Tianpeng Li, Fan Yang, Hao Liang, Keer Lu, Bin Cui, Wentao Zhang, Zenan Zhou, Weipeng Chen
TL;DR
This paper tackles the need for LLMs to plan with multiple function calls in real-world tasks. It introduces BUTTON, a bottom-up then top-down pipeline that constructs compositional instructions from atomic tasks and simulates multi-turn function calling via a multi-agent environment. The authors create BUTTONInstruct, an 8k-data dataset, and demonstrate that instruction-tuning LLMs with BUTTONInstruct improves multi-turn function calling performance across benchmarks, especially for smaller models. They also analyze ablations, data scaling, and parallel calling to validate design choices, while acknowledging data quality depends on prompts and suggesting future verification and embodied-AI extensions. The work advances structured interactions with tools in LLMs and provides a scalable framework for teaching planning and decomposition in complex tasks.
Abstract
Large Language Models (LLMs) have exhibited significant potential in performing diverse tasks, including the ability to call functions or use external tools to enhance their performance. While current research on function calling by LLMs primarily focuses on single-turn interactions, this paper addresses the overlooked necessity for LLMs to engage in multi-turn function calling--critical for handling compositional, real-world queries that require planning with functions but not only use functions. To facilitate this, we introduce an approach, BUTTON, which generates synthetic compositional instruction tuning data via bottom-up instruction construction and top-down trajectory generation. In the bottom-up phase, we generate simple atomic tasks based on real-world scenarios and build compositional tasks using heuristic strategies based on atomic tasks. Corresponding function definitions are then synthesized for these compositional tasks. The top-down phase features a multi-agent environment where interactions among simulated humans, assistants, and tools are utilized to gather multi-turn function calling trajectories. This approach ensures task compositionality and allows for effective function and trajectory generation by examining atomic tasks within compositional tasks. We produce a dataset BUTTONInstruct comprising 8k data points and demonstrate its effectiveness through extensive experiments across various LLMs.
