Table of Contents
Fetching ...

Robust and Efficient Tool Orchestration via Layered Execution Structures with Reflective Correction

Tao Zhe, Haoyu Wang, Bo Luo, Min Wu, Wei Fan, Xiao Luo, Zijun Yao, Haifeng Chen, Dongjie Wang

TL;DR

This work model tool orchestration as learning a layered execution structure that captures high-level tool dependencies, inducing layer-wise execution through context constraints, and introduces a schema-aware reflective correction mechanism that detects and repairs errors locally.

Abstract

Tool invocation is a core capability of agentic systems, yet failures often arise not from individual tool calls but from how multiple tools are organized and executed together. Existing approaches tightly couple tool execution with stepwise language reasoning or explicit planning, leading to brittle behavior and high execution overhead. To overcome these limitations, we revisit tool invocation from the perspective of tool orchestration. Our key insight is that effective orchestration does not require precise dependency graphs or fine-grained planning. Instead, a coarse-grained layer structure suffices to provide global guidance, while execution-time errors can be corrected locally. Specifically, we model tool orchestration as learning a layered execution structure that captures high-level tool dependencies, inducing layer-wise execution through context constraints. To handle execution-time failures, we introduce a schema-aware reflective correction mechanism that detects and repairs errors locally. This design confines errors to individual tool calls and avoids re-planning entire execution trajectories. This structured execution paradigm enables a lightweight and reusable orchestration component for agentic systems. Experimental results show that our approach achieves robust tool execution while reducing execution complexity and overhead. Code will be made publicly available.

Robust and Efficient Tool Orchestration via Layered Execution Structures with Reflective Correction

TL;DR

This work model tool orchestration as learning a layered execution structure that captures high-level tool dependencies, inducing layer-wise execution through context constraints, and introduces a schema-aware reflective correction mechanism that detects and repairs errors locally.

Abstract

Tool invocation is a core capability of agentic systems, yet failures often arise not from individual tool calls but from how multiple tools are organized and executed together. Existing approaches tightly couple tool execution with stepwise language reasoning or explicit planning, leading to brittle behavior and high execution overhead. To overcome these limitations, we revisit tool invocation from the perspective of tool orchestration. Our key insight is that effective orchestration does not require precise dependency graphs or fine-grained planning. Instead, a coarse-grained layer structure suffices to provide global guidance, while execution-time errors can be corrected locally. Specifically, we model tool orchestration as learning a layered execution structure that captures high-level tool dependencies, inducing layer-wise execution through context constraints. To handle execution-time failures, we introduce a schema-aware reflective correction mechanism that detects and repairs errors locally. This design confines errors to individual tool calls and avoids re-planning entire execution trajectories. This structured execution paradigm enables a lightweight and reusable orchestration component for agentic systems. Experimental results show that our approach achieves robust tool execution while reducing execution complexity and overhead. Code will be made publicly available.
Paper Structure (44 sections, 8 equations, 7 figures, 4 tables, 1 algorithm)

This paper contains 44 sections, 8 equations, 7 figures, 4 tables, 1 algorithm.

Figures (7)

  • Figure 1: Valid vs. invalid tool execution. Hallucinated premature computation from SLM before data parsing leads to cascading errors, producing a fluent but incorrect final answer report.
  • Figure 2: Overview of ModelTool. Left: layer assignment from the task query and tool descriptions. Top-right: repair loop for failed tool calls (schema gate $\rightarrow$ LLM repair) under a budget. Bottom-right: context-constrained execution with step-specific tool constraints; observations are carried across layers to produce the final answer.
  • Figure 3: Solvable Win Rate (%) of open-source models against GPT-3.5-0613 + ReAct on StableToolBench, computed from per-instance comparisons; the dashed line marks 50% parity.
  • Figure 4: SoPR (%) across Qwen2.5 model sizes (0.5B–7B). Performance degrades gradually, with 3B and 1.5B remaining functional.
  • Figure 5: Ablation study on execution sketch and repair modules. We report SoPR (%) with Qwen2.5-7B on three test sets.
  • ...and 2 more figures