OrchDAG: Complex Tool Orchestration in Multi-Turn Interactions with Plan DAGs
Yifu Lu, Shengjie Liu, Li Dong
TL;DR
OrchDAG tackles the challenge of evaluating and improving multi-turn tool use in LLMs by modeling tool execution as controllable DAGs and generating synthetic data to stress planning and reasoning. It combines a DAG-based data generation pipeline with a graph-aware reward signal for RLVR, using Graph Edit Distance to capture structural dependencies between predicted and ground-truth tool graphs. Experiments show the dataset is solvable by strong models like GPT-4o and Claude-4 but remains challenging for smaller models, and the GED-based reward improves training when combined with GRPO-style methods, particularly in single-turn settings. The work highlights the importance of leveraging topological structure and data complexity in multi-turn tool use for building robust, reliable agentic LLMs.
Abstract
Agentic tool use has gained traction with the rise of agentic tool calling, yet most existing work overlooks the complexity of multi-turn tool interactions. We introduce OrchDAG, a synthetic data generation pipeline that models tool execution as directed acyclic graphs (DAGs) with controllable complexity. Using this dataset, we benchmark model performance and propose a graph-based reward to enhance RLVR training. Experiments show that the dataset presents a challenging but solvable benchmark, and the proposed reward is effective when combined with GRPO-style algorithms, highlighting the importance of leveraging topological structure and data complexity in multi-turn tool use.
