Table of Contents
Fetching ...

Spend Less, Reason Better: Budget-Aware Value Tree Search for LLM Agents

Yushu Li, Wenlong Deng, Jiajin Li, Xiaoxiao Li

Abstract

Test-time scaling has become a dominant paradigm for improving LLM agent reliability, yet current approaches treat compute as an abundant resource, allowing agents to exhaust token and tool budgets on redundant steps or dead-end trajectories. Existing budget-aware methods either require expensive fine-tuning or rely on coarse, trajectory-level heuristics that cannot intervene mid-execution. We propose the Budget-Aware Value Tree (BAVT), a training-free inference-time framework that models multi-hop reasoning as a dynamic search tree guided by step-level value estimation within a single LLM backbone. Another key innovation is a budget-conditioned node selection mechanism that uses the remaining resource ratio as a natural scaling exponent over node values, providing a principled, parameter-free transition from broad exploration to greedy exploitation as the budget depletes. To combat the well-known overconfidence of LLM self-evaluation, BAVT employs a residual value predictor that scores relative progress rather than absolute state quality, enabling reliable pruning of uninformative or redundant tool calls. We further provide a theoretical convergence guarantee, proving that BAVT reaches a terminal answer with probability at least $1-ε$ under an explicit finite budget bound. Extensive evaluations on four multi-hop QA benchmarks across two model families demonstrate that BAVT consistently outperforms parallel sampling baselines. Most notably, BAVT under strict low-budget constraints surpasses baseline performance at $4\times$ the resource allocation, establishing that intelligent budget management fundamentally outperforms brute-force compute scaling.

Spend Less, Reason Better: Budget-Aware Value Tree Search for LLM Agents

Abstract

Test-time scaling has become a dominant paradigm for improving LLM agent reliability, yet current approaches treat compute as an abundant resource, allowing agents to exhaust token and tool budgets on redundant steps or dead-end trajectories. Existing budget-aware methods either require expensive fine-tuning or rely on coarse, trajectory-level heuristics that cannot intervene mid-execution. We propose the Budget-Aware Value Tree (BAVT), a training-free inference-time framework that models multi-hop reasoning as a dynamic search tree guided by step-level value estimation within a single LLM backbone. Another key innovation is a budget-conditioned node selection mechanism that uses the remaining resource ratio as a natural scaling exponent over node values, providing a principled, parameter-free transition from broad exploration to greedy exploitation as the budget depletes. To combat the well-known overconfidence of LLM self-evaluation, BAVT employs a residual value predictor that scores relative progress rather than absolute state quality, enabling reliable pruning of uninformative or redundant tool calls. We further provide a theoretical convergence guarantee, proving that BAVT reaches a terminal answer with probability at least under an explicit finite budget bound. Extensive evaluations on four multi-hop QA benchmarks across two model families demonstrate that BAVT consistently outperforms parallel sampling baselines. Most notably, BAVT under strict low-budget constraints surpasses baseline performance at the resource allocation, establishing that intelligent budget management fundamentally outperforms brute-force compute scaling.
Paper Structure (55 sections, 1 theorem, 15 equations, 5 figures, 3 tables, 1 algorithm)

This paper contains 55 sections, 1 theorem, 15 equations, 5 figures, 3 tables, 1 algorithm.

Key Result

Theorem 1

Given Assumptions assump:progress-assump:bounded_pool, for any arbitrarily small failure probability $\epsilon > 0$, there exists a finite budget bound $B$ such that the BAVT framework successfully generates a node satisfying $V(s_t) \ge \tau$ with probability at least $1 - \epsilon$.

Figures (5)

  • Figure 1: Budget-Aware Value Tree (BAVT) versus parallel sampling. Left: Parallel sampling explores many trajectories in parallel but may waste budget on redundant or dead-end paths. Right: BAVT performs tree-structured search with step-level value estimation and budget-aware expansion to adaptively shift from broad exploration (high remaining budget) to deep exploitation (low remaining budget), improving the performance--efficiency trade-off under strict resource constraints.
  • Figure 2: Overview of the Budget-Aware Value Tree (BAVT) framework. a) Budget-Aware Expansion dynamically adjusts the node selection distribution, shifting from exploration to exploitation as resources deplete. b) The Test-Time Scaling Tree models the reasoning process, allowing the agent to explore multiple paths. c) Step-Level Value Estimation utilizes a dual-role Actor-Critic setup within a single backbone to evaluate intermediate progress at the step level.
  • Figure 3: Average performance-efficiency trade-off across the four evaluated multi-hop QA benchmarks for OSS-20B and Qwen3-30B. BAVT operating under strict Low budget constraints (5 calls) consistently rivals or surpasses the baseline's High budget performance (20 calls), demonstrating that intelligent resource management fundamentally outperforms $4\times$ brute-force compute scaling.
  • Figure 4: Performance of the OSS-20B reasoning model on multi-hop QA benchmarks. BAVT achieves strictly superior performance and resource efficiency compared to the baseline across all datasets and budget constraints.
  • Figure 5: Performance of the Qwen3-30B instruct model across multi-hop QA benchmarks. BAVT achieves strictly superior performance, effectively raising the baseline’s ceiling.

Theorems & Definitions (3)

  • Remark 1
  • Theorem 1: Probabilistic Convergence to Answer Generation
  • proof