Utility-Guided Agent Orchestration for Efficient LLM Tool Use

Boyan Liu; Gongming Zhao; Hongli Xu

Utility-Guided Agent Orchestration for Efficient LLM Tool Use

Boyan Liu, Gongming Zhao, Hongli Xu

Abstract

Tool-using large language model (LLM) agents often face a fundamental tension between answer quality and execution cost. Fixed workflows are stable but inflexible, while free-form multi-step reasoning methods such as ReAct may improve task performance at the expense of excessive tool calls, longer trajectories, higher token consumption, and increased latency. In this paper, we study agent orchestration as an explicit decision problem rather than leaving it entirely to prompt-level behavior. We propose a utility-guided orchestration policy that selects among actions such as respond, retrieve, tool call, verify, and stop by balancing estimated gain, step cost, uncertainty, and redundancy. Our goal is not to claim universally best task performance, but to provide a controllable and analyzable policy framework for studying quality-cost trade-offs in tool-using LLM agents. Experiments across direct answering, threshold control, fixed workflows, ReAct, and several policy variants show that explicit orchestration signals substantially affect agent behavior. Additional analyses on cost definitions, workflow fairness, and redundancy control further demonstrate that lightweight utility design can provide a defensible and practical mechanism for agent control.

Utility-Guided Agent Orchestration for Efficient LLM Tool Use

Abstract

Paper Structure (30 sections, 3 equations, 6 figures, 5 tables)

This paper contains 30 sections, 3 equations, 6 figures, 5 tables.

Introduction
Motivation
Related Work
Tool-Using LLM Agents
Reasoning, Planning, and Self-Improvement
Agent Systems, Orchestration, and Multi-Agent Frameworks
Efficient and Cost-Aware Inference
Agent Benchmarks and Evaluation
Design
Problem Formulation
Agent State
Utility-Guided Action Selection
Utility Components
Estimated Gain.
Step Cost.
...and 15 more sections

Figures (6)

Figure 1: Overview of the proposed utility-guided agent orchestration framework. At each step, the agent constructs a state representation from the current query, interaction history, and tool observations. A utility scorer evaluates candidate actions using estimated gain, step cost, uncertainty, and redundancy, and the action selector chooses the highest-utility action. The process iterates until a stopping condition is met.
Figure 2: Main quality--cost trade-off measured by F1 versus tokens on the shared HotpotQA sample.
Figure 3: Main quality--cost trade-off measured by F1 versus wall-clock time. This complements Figure \ref{['fig:main-pareto']} by showing the latency-side frontier.
Figure 4: Effect of reasoning depth on answer quality and cost. Increasing the maximum number of reasoning steps improves F1 at first, but also increases token usage and latency, illustrating diminishing marginal returns.
Figure 5: Relationship between heuristic signals and continuation behavior. Continue-rate is much lower in low expected-gain buckets and substantially higher in mid/high-gain buckets, supporting the interpretation of these signals as decision heuristics rather than decorative outputs.
...and 1 more figures

Utility-Guided Agent Orchestration for Efficient LLM Tool Use

Abstract

Utility-Guided Agent Orchestration for Efficient LLM Tool Use

Authors

Abstract

Table of Contents

Figures (6)