Table of Contents
Fetching ...

Beyond ReAct: A Planner-Centric Framework for Complex Tool-Augmented LLM Reasoning

Xiaolong Wei, Yuehu Dong, Xingliang Wang, Xingyu Zhang, Zhejun Zhao, Dongdong Shen, Long Xia, Dawei Yin

TL;DR

To address the local optimization traps in tool-augmented LLMs, the paper introduces a Planner-centric Plan-Execute framework that uses global DAG planning for complex queries. It presents ComplexTool-Plan, a large-scale benchmark for multi-tool planning, and a two-stage training regime (SFT followed by GRPO RL) to optimize tool selection and plan structure. With this approach, the Planner, when paired with a capable executor, achieves state-of-the-art end-to-end performance on StableToolBench, and demonstrates superior efficiency with minimal inference steps. The work provides a scalable path toward robust, complex tool orchestration in real-world LLM systems.

Abstract

Existing tool-augmented large language models (LLMs) encounter significant challenges when processing complex queries. Current frameworks such as ReAct are prone to local optimization traps due to their reliance on incremental decision-making processes. To address these limitations, we propose a novel Planner-centric Plan-Execute paradigm that fundamentally resolves local optimization bottlenecks through architectural innovation. Central to our approach is a novel Planner model that performs global Directed Acyclic Graph (DAG) planning for complex queries, enabling optimized execution beyond conventional tool coordination. We also introduce ComplexTool-Plan, a large-scale benchmark dataset featuring complex queries that demand sophisticated multi-tool composition and coordination capabilities. Additionally, we develop a two-stage training methodology that integrates Supervised Fine-Tuning (SFT) with Group Relative Policy Optimization (GRPO), systematically enhancing the Planner's tool selection accuracy and global planning awareness through structured DAG-based planning. When integrated with a capable executor, our framework achieves state-of-the-art performance on the StableToolBench benchmark for complex user queries, demonstrating superior end-to-end execution capabilities and robust handling of intricate multi-tool workflows.

Beyond ReAct: A Planner-Centric Framework for Complex Tool-Augmented LLM Reasoning

TL;DR

To address the local optimization traps in tool-augmented LLMs, the paper introduces a Planner-centric Plan-Execute framework that uses global DAG planning for complex queries. It presents ComplexTool-Plan, a large-scale benchmark for multi-tool planning, and a two-stage training regime (SFT followed by GRPO RL) to optimize tool selection and plan structure. With this approach, the Planner, when paired with a capable executor, achieves state-of-the-art end-to-end performance on StableToolBench, and demonstrates superior efficiency with minimal inference steps. The work provides a scalable path toward robust, complex tool orchestration in real-world LLM systems.

Abstract

Existing tool-augmented large language models (LLMs) encounter significant challenges when processing complex queries. Current frameworks such as ReAct are prone to local optimization traps due to their reliance on incremental decision-making processes. To address these limitations, we propose a novel Planner-centric Plan-Execute paradigm that fundamentally resolves local optimization bottlenecks through architectural innovation. Central to our approach is a novel Planner model that performs global Directed Acyclic Graph (DAG) planning for complex queries, enabling optimized execution beyond conventional tool coordination. We also introduce ComplexTool-Plan, a large-scale benchmark dataset featuring complex queries that demand sophisticated multi-tool composition and coordination capabilities. Additionally, we develop a two-stage training methodology that integrates Supervised Fine-Tuning (SFT) with Group Relative Policy Optimization (GRPO), systematically enhancing the Planner's tool selection accuracy and global planning awareness through structured DAG-based planning. When integrated with a capable executor, our framework achieves state-of-the-art performance on the StableToolBench benchmark for complex user queries, demonstrating superior end-to-end execution capabilities and robust handling of intricate multi-tool workflows.

Paper Structure

This paper contains 25 sections, 3 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: An example of a simple versus a complex task. A simple query results in a basic, parallel DAG, while a complex query involving nested logic is translated into a more elaborate, multi-level DAG.
  • Figure 2: The figure illustrates our proposed framework. (a) The Training Process shows our automated pipeline for creating a training dataset and then training the Planner model via fine-tuning and reinforcement learning (GRPO). (b) The Executing Process demonstrates how the trained Planner takes a user query, generates a parallelizable execution plan as a Directed Acyclic Graph (DAG), and orchestrates the tools to produce the final answer.
  • Figure 3: This chart shows our three task difficulties: Easy, Medium, and Hard. Harder tasks have more available tools to choose from (blue) and also require more tools to be used (orange).