Adaptive Test-Time Compute Allocation via Learned Heuristics over Categorical Structure
Shuhui Qu
TL;DR
The paper tackles verification-cost-limited reasoning by treating verifier calls as the scarce resource and allocating them at intermediate states rather than globally. It introduces a gated competition framework with (i) deterministic feasibility gates that prune structurally invalid moves, (ii) a hybrid scoring function combining a learned structural distance $D_{ ext{type}}$ and a learned residual $r_\theta(w,m)$ to rank surviving moves, and (iii) state-conditional verification budget $k(w)$ based on local uncertainty. A training recipe uses verifier-labeled candidate lists to learn $r_\theta$, optionally augmented by a remaining-steps trajectory signal to bias toward shorter successful traces. On the MATH benchmark, this approach achieves 55.2% accuracy with 44.8 verifier calls, outperforming best-of-$N$, majority voting, and beam search at the same budget, and demonstrating that fine-grained, state-aware allocation can yield substantial efficiency gains. The work highlights that reducing wasted verification and focusing checks where ambiguity is highest can meaningfully improve the accuracy–cost frontier, with limitations tied to proposal quality and the expressiveness of the structured move interface.
Abstract
Test-time computation has become a primary driver of progress in large language model (LLM) reasoning, but it is increasingly bottlenecked by expensive verification. In many reasoning systems, a large fraction of verifier calls are spent on redundant or unpromising intermediate hypotheses. We study reasoning under a \emph{verification-cost-limited} setting and ask how verification effort should be allocated across intermediate states. We propose a state-level selective verification framework that combines (i) deterministic feasibility gating over a structured move interface, (ii) pre-verification ranking using a hybrid of learned state-distance and residual scoring, and (iii) adaptive allocation of verifier calls based on local uncertainty. Unlike solution-level best-of-$N$ or uniform intermediate verification, our method distributes verification where it is most informative. On the \textsc{MATH} benchmark, our approach achieves higher accuracy than best-of-$N$, majority voting, and beam search while using 44\% fewer verifier calls.
