GAP: Graph-Based Agent Planning with Parallel Tool Use and Reinforcement Learning
Jiaqi Wu, Qinlao Zhao, Zefeng Chen, Kai Qin, Yifei Zhao, Xueqian Wang, Yuhang Yao
TL;DR
GAP addresses the sequential bottleneck of ReAct-style tool use by enabling dependency-aware planning through graph-based reasoning. By constructing a directed acyclic graph $G=(V,E)$ of subtasks and executing independent branches in parallel via topological execution levels, GAP achieves superior performance and efficiency on multi-hop QA benchmarks. The approach combines a two-stage training pipeline—supervised fine-tuning on a curated set of 7,000 graph-based planning traces and end-to-end reinforcement learning with a correctness-based reward—to learn when and how to parallelize tool invocations. Experimental results across seven MHQA datasets show GAP outperforms traditional baselines, reduces interaction turns, shortens response length, and lowers deployment costs, indicating practical potential for scalable, tool-augmented agents.
Abstract
Autonomous agents powered by large language models (LLMs) have shown impressive capabilities in tool manipulation for complex task-solving. However, existing paradigms such as ReAct rely on sequential reasoning and execution, failing to exploit the inherent parallelism among independent sub-tasks. This sequential bottleneck leads to inefficient tool utilization and suboptimal performance in multi-step reasoning scenarios. We introduce Graph-based Agent Planning (GAP), a novel framework that explicitly models inter-task dependencies through graph-based planning to enable adaptive parallel and serial tool execution. Our approach trains agent foundation models to decompose complex tasks into dependency-aware sub-task graphs, autonomously determining which tools can be executed in parallel and which must follow sequential dependencies. This dependency-aware orchestration achieves substantial improvements in both execution efficiency and task accuracy. To train GAP, we construct a high-quality dataset of graph-based planning traces derived from the Multi-Hop Question Answering (MHQA) benchmark. We employ a two-stage training strategy: supervised fine-tuning (SFT) on the curated dataset, followed by reinforcement learning (RL) with a correctness-based reward function on strategically sampled queries where tool-based reasoning provides maximum value. Experimental results on MHQA datasets demonstrate that GAP significantly outperforms traditional ReAct baselines, particularly on multi-step retrieval tasks, while achieving dramatic improvements in tool invocation efficiency through intelligent parallelization. The project page is available at: https://github.com/WJQ7777/Graph-Agent-Planning.
