Table of Contents
Fetching ...

GAP: Graph-Based Agent Planning with Parallel Tool Use and Reinforcement Learning

Jiaqi Wu, Qinlao Zhao, Zefeng Chen, Kai Qin, Yifei Zhao, Xueqian Wang, Yuhang Yao

TL;DR

GAP addresses the sequential bottleneck of ReAct-style tool use by enabling dependency-aware planning through graph-based reasoning. By constructing a directed acyclic graph $G=(V,E)$ of subtasks and executing independent branches in parallel via topological execution levels, GAP achieves superior performance and efficiency on multi-hop QA benchmarks. The approach combines a two-stage training pipeline—supervised fine-tuning on a curated set of 7,000 graph-based planning traces and end-to-end reinforcement learning with a correctness-based reward—to learn when and how to parallelize tool invocations. Experimental results across seven MHQA datasets show GAP outperforms traditional baselines, reduces interaction turns, shortens response length, and lowers deployment costs, indicating practical potential for scalable, tool-augmented agents.

Abstract

Autonomous agents powered by large language models (LLMs) have shown impressive capabilities in tool manipulation for complex task-solving. However, existing paradigms such as ReAct rely on sequential reasoning and execution, failing to exploit the inherent parallelism among independent sub-tasks. This sequential bottleneck leads to inefficient tool utilization and suboptimal performance in multi-step reasoning scenarios. We introduce Graph-based Agent Planning (GAP), a novel framework that explicitly models inter-task dependencies through graph-based planning to enable adaptive parallel and serial tool execution. Our approach trains agent foundation models to decompose complex tasks into dependency-aware sub-task graphs, autonomously determining which tools can be executed in parallel and which must follow sequential dependencies. This dependency-aware orchestration achieves substantial improvements in both execution efficiency and task accuracy. To train GAP, we construct a high-quality dataset of graph-based planning traces derived from the Multi-Hop Question Answering (MHQA) benchmark. We employ a two-stage training strategy: supervised fine-tuning (SFT) on the curated dataset, followed by reinforcement learning (RL) with a correctness-based reward function on strategically sampled queries where tool-based reasoning provides maximum value. Experimental results on MHQA datasets demonstrate that GAP significantly outperforms traditional ReAct baselines, particularly on multi-step retrieval tasks, while achieving dramatic improvements in tool invocation efficiency through intelligent parallelization. The project page is available at: https://github.com/WJQ7777/Graph-Agent-Planning.

GAP: Graph-Based Agent Planning with Parallel Tool Use and Reinforcement Learning

TL;DR

GAP addresses the sequential bottleneck of ReAct-style tool use by enabling dependency-aware planning through graph-based reasoning. By constructing a directed acyclic graph of subtasks and executing independent branches in parallel via topological execution levels, GAP achieves superior performance and efficiency on multi-hop QA benchmarks. The approach combines a two-stage training pipeline—supervised fine-tuning on a curated set of 7,000 graph-based planning traces and end-to-end reinforcement learning with a correctness-based reward—to learn when and how to parallelize tool invocations. Experimental results across seven MHQA datasets show GAP outperforms traditional baselines, reduces interaction turns, shortens response length, and lowers deployment costs, indicating practical potential for scalable, tool-augmented agents.

Abstract

Autonomous agents powered by large language models (LLMs) have shown impressive capabilities in tool manipulation for complex task-solving. However, existing paradigms such as ReAct rely on sequential reasoning and execution, failing to exploit the inherent parallelism among independent sub-tasks. This sequential bottleneck leads to inefficient tool utilization and suboptimal performance in multi-step reasoning scenarios. We introduce Graph-based Agent Planning (GAP), a novel framework that explicitly models inter-task dependencies through graph-based planning to enable adaptive parallel and serial tool execution. Our approach trains agent foundation models to decompose complex tasks into dependency-aware sub-task graphs, autonomously determining which tools can be executed in parallel and which must follow sequential dependencies. This dependency-aware orchestration achieves substantial improvements in both execution efficiency and task accuracy. To train GAP, we construct a high-quality dataset of graph-based planning traces derived from the Multi-Hop Question Answering (MHQA) benchmark. We employ a two-stage training strategy: supervised fine-tuning (SFT) on the curated dataset, followed by reinforcement learning (RL) with a correctness-based reward function on strategically sampled queries where tool-based reasoning provides maximum value. Experimental results on MHQA datasets demonstrate that GAP significantly outperforms traditional ReAct baselines, particularly on multi-step retrieval tasks, while achieving dramatic improvements in tool invocation efficiency through intelligent parallelization. The project page is available at: https://github.com/WJQ7777/Graph-Agent-Planning.

Paper Structure

This paper contains 35 sections, 7 equations, 3 figures, 4 tables, 1 algorithm.

Figures (3)

  • Figure 1: Illustration of Graph-based Agent Planning paradigm. GAP decomposes tasks into dependency-aware subtasks in the planning stage, enabling identification of parallelizable tool operations. The system supports parallel tool and agent calling for enhanced computational efficiency.
  • Figure 2: Performance-cost trade-off comparison across different models on HotpotQA. GAP-3B achieves the best balance with highest accuracy at lowest cost.
  • Figure 3: Illustration of total turns and response length on HotpotQA and 2WikiMultiHopQA datasets. Left panels show response length distribution, right panels show cumulative percentage of problems solved within different numbers of turns.