Aligning Tree-Search Policies with Fixed Token Budgets in Test-Time Scaling of LLMs
Sora Miyamoto, Daisuke Oba, Naoaki Okazaki
TL;DR
This work tackles decoding under fixed token budgets in large language models by introducing Budget-Guided MCTS (BG-MCTS), a budget-conditioned tree-search algorithm that shifts from broad exploration to refinement and completion as the remaining budget dwindles. It achieves this through two mechanisms: (i) budget-conditioned selection via BG-PUCT, which anneals exploration and applies a completion-aware value correction, and (ii) budget-guided widening, which adds a controlled generative option to widen the search when beneficial. The approach is validated on math-reasoning benchmarks (MATH500 Level 5 and AIME24/25) using open-weight LLMs (Llama-3.1-8B-Instruct and Qwen-2.5-7B-Instruct) across budgets $B \in \{10{,}000, 20{,}000, 30{,}000\}$, where BG-MCTS consistently outperforms budget-agnostic baselines and exhibits higher-quality final answers near budget exhaustion. The results imply budget-conditioned decoding yields a more reliable accuracy-cost trade-off for fixed-budget inference, with implications for data synthesis and real-world deployment where per-query costs are bounded.
Abstract
Tree-search decoding is an effective form of test-time scaling for large language models (LLMs), but real-world deployment imposes a fixed per-query token budget that varies across settings. Existing tree-search policies are largely budget-agnostic, treating the budget as a termination condition, which can lead to late-stage over-branching or premature termination. We propose {Budget-Guided MCTS} (BG-MCTS), a tree-search decoding algorithm that aligns its search policy with the remaining token budget: it starts with broad exploration, then prioritizes refinement and answer completion as the budget depletes while reducing late-stage branching from shallow nodes. BG-MCTS consistently outperforms budget-agnostic tree-search baselines across different budgets on MATH500 and AIME24/25 with open-weight LLMs.
