NaviAgent: Bilevel Planning on Tool Navigation Graph for Large-Scale Orchestration
Yan Jiang, Hao Zhou, LiZhong GU, Ai Han, TianLong Li
TL;DR
NaviAgent tackles the brittleness and scalability limits of large‑scale tool use by separating high‑level task planning from low‑level tool execution through a bilevel architecture. The Task Planning level uses a four‑dimensional decision framework to decide between direct responses, intent clarifications, toolchain retrieval, or tool execution, while the Execution level builds a Tool World Navigation Model (TWNM) that encodes structural and behavioral tool dependencies as a dynamic graph. TWNM enables scalable retrieval, substitution, and composition of toolchains, with graph evolution mechanisms that incrementally add tools, prune obsolete ones, and propagate edge attributes to reflect evolving APIs. Closed‑loop feedback adjusts both planning and execution in response to real tool interactions, yielding robust improvements in task success rates across models and tasks, and up to 17 points boost from incorporating TWNM on complex tasks. The approach demonstrates strong potential for real‑world, large‑scale tool ecosystems, enabling adaptive, end‑to‑end navigation rather than brittle, step‑by‑step tool calls.
Abstract
Large language models (LLMs) have recently demonstrated the ability to act as function call agents by invoking external tools, enabling them to solve tasks beyond their static knowledge. However, existing agents typically call tools step by step at a time without a global view of task structure. As tools depend on each other, this leads to error accumulation and limited scalability, particularly when scaling to thousands of tools. To address these limitations, we propose NaviAgent, a novel bilevel architecture that decouples task planning from tool execution through graph-based modeling of the tool ecosystem. At the task-planning level, the LLM-based agent decides whether to respond directly, clarify user intent, invoke a toolchain, or execute tool outputs, ensuring broad coverage of interaction scenarios independent of inter-tool complexity. At the execution level, a continuously evolving Tool World Navigation Model (TWNM) encodes structural and behavioral relations among tools, guiding the agent to generate scalable and robust invocation sequences. By incorporating feedback from real tool interactions, NaviAgent supports closed-loop optimization of planning and execution, moving beyond tool calling toward adaptive navigation of large-scale tool ecosystems. Experiments show that NaviAgent achieves the best task success rates across models and tasks, and integrating TWMN further boosts performance by up to 17 points on complex tasks, underscoring its key role in toolchain orchestration.
