Table of Contents
Fetching ...

PETS: A Principled Framework Towards Optimal Trajectory Allocation for Efficient Test-Time Self-Consistency

Zhangyi Liu, Huaizhi Qu, Xiaowei Yin, He Sun, Yanjun Han, Tianlong Chen, Zhun Deng

TL;DR

This work introduces PETS (Principled and Efficient Test-TimeSelf-Consistency), which initiates a principled study of trajectory allocation through an optimization framework and proposes a novel method inspired by the offline framework that adapts budgets to question difficulty while preserving strong theoretical guarantees and computational efficiency.

Abstract

Test-time scaling can improve model performance by aggregating stochastic reasoning trajectories. However, achieving sample-efficient test-time self-consistency under a limited budget remains an open challenge. We introduce PETS (Principled and Efficient Test-TimeSelf-Consistency), which initiates a principled study of trajectory allocation through an optimization framework. Central to our approach is the self-consistency rate, a new measure defined as agreement with the infinite-budget majority vote. This formulation makes sample-efficient test-time allocation theoretically grounded and amenable to rigorous analysis. We study both offline and online settings. In the offline regime, where all questions are known in advance, we connect trajectory allocation to crowdsourcing, a classic and well-developed area, by modeling reasoning traces as workers. This perspective allows us to leverage rich existing theory, yielding theoretical guarantees and an efficient majority-voting-based allocation algorithm. In the online streaming regime, where questions arrive sequentially and allocations must be made on the fly, we propose a novel method inspired by the offline framework. Our approach adapts budgets to question difficulty while preserving strong theoretical guarantees and computational efficiency. Experiments show that PETS consistently outperforms uniform allocation. On GPQA, PETS achieves perfect self-consistency in both settings while reducing the sampling budget by up to 75% (offline) and 55% (online) relative to uniform allocation. Code is available at https://github.com/ZDCSlab/PETS.

PETS: A Principled Framework Towards Optimal Trajectory Allocation for Efficient Test-Time Self-Consistency

TL;DR

This work introduces PETS (Principled and Efficient Test-TimeSelf-Consistency), which initiates a principled study of trajectory allocation through an optimization framework and proposes a novel method inspired by the offline framework that adapts budgets to question difficulty while preserving strong theoretical guarantees and computational efficiency.

Abstract

Test-time scaling can improve model performance by aggregating stochastic reasoning trajectories. However, achieving sample-efficient test-time self-consistency under a limited budget remains an open challenge. We introduce PETS (Principled and Efficient Test-TimeSelf-Consistency), which initiates a principled study of trajectory allocation through an optimization framework. Central to our approach is the self-consistency rate, a new measure defined as agreement with the infinite-budget majority vote. This formulation makes sample-efficient test-time allocation theoretically grounded and amenable to rigorous analysis. We study both offline and online settings. In the offline regime, where all questions are known in advance, we connect trajectory allocation to crowdsourcing, a classic and well-developed area, by modeling reasoning traces as workers. This perspective allows us to leverage rich existing theory, yielding theoretical guarantees and an efficient majority-voting-based allocation algorithm. In the online streaming regime, where questions arrive sequentially and allocations must be made on the fly, we propose a novel method inspired by the offline framework. Our approach adapts budgets to question difficulty while preserving strong theoretical guarantees and computational efficiency. Experiments show that PETS consistently outperforms uniform allocation. On GPQA, PETS achieves perfect self-consistency in both settings while reducing the sampling budget by up to 75% (offline) and 55% (online) relative to uniform allocation. Code is available at https://github.com/ZDCSlab/PETS.
Paper Structure (47 sections, 10 theorems, 160 equations, 17 figures, 3 tables, 2 algorithms)

This paper contains 47 sections, 10 theorems, 160 equations, 17 figures, 3 tables, 2 algorithms.

Key Result

Lemma 3.1

Given the terminal belief $S^H$, the Bayes-optimal decision for each question is therefore

Figures (17)

  • Figure 1: In this paper, we study how to allocate an LLM’s sampling budget across questions to best match the full-budget outcome under self-consistency. Our results show that PETS substantially reduces the required budget while maintaining accuracy.
  • Figure 2: Budget allocation plan of the offline and online settings on 9 simulated binary choice questions, $\mathcal{Y}=\{1,2\}$. Each question is associated with a $\theta=\max(\theta_1,\theta_2)$, and larger theta indicates easier questions.
  • Figure 3: Budget allocation curve in the offline setting. "(conf)" denotes the trace confidence-weighted variant. Consistency is computed within each matched-variant comparison group: PETS-Offline vs. Uniform, and PETS-Offline (conf) vs. Uniform (conf).
  • Figure 4: Budget allocation curve in the online setting. "(conf)" denotes the trace confidence-weighted variant. Consistency is computed within each matched-variant comparison group: PETS-Online vs. Uniform, and PETS-Online (conf) vs. Uniform (conf). Oracle variant assumes access to the latent parameter $\bm\theta$, while in the online setting, $\bm\theta$ is learnt from a training dataset.
  • Figure 5: Probability that multinomial majority voting selects the true best option as a function of budget. We plot $\mathbb{P}\!\left(Y^{\mathrm{Maj}}(B)=\arg\max_{y\in\mathcal{Y}}\theta_y\right)$ versus $\textsc{Budget}\in\{1,\ldots,64\}$ for different ground-truth preference vectors $\bm{\theta}=(\theta_1,\ldots,\theta_M)$. Gray curves (Exact) are computed from the true $\bm{\theta}$ (exact for $M=2,4$; Monte Carlo estimates for $M=10$), while green dotted curves (Probit) are produced by fitting a two-parameter probit model and evaluating the fitted model across budgets. The fitted probit curves closely track the exact/MC curves in all regimes. Panels correspond to $M\in\{2,4,10\}$. For $M=4$, we use: dominate$[0.8,0.1,0.1,0]$, head-heavy$[0.6,0.3,0.05,0.05]$, linear$[0.4,0.3,0.2,0.1]$, flat$[0.3,0.25,0.25,0.2]$, and uniform$[0.25,0.25,0.25,0.25]$. For $M=10$, we use: dominate$[0.8,0.15,0.05,0,\ldots,0]$, head-heavy$[0.25,0.15,0.10,0.0714,\ldots,0.0714]$, linear$[10,9,\ldots,1]/55$, flat$\mathrm{normalize}([10,9,\ldots,1]^{0.4})$, and uniform$[0.1,\ldots,0.1]$.
  • ...and 12 more figures

Theorems & Definitions (22)

  • Lemma 3.1: Bayes-optimal terminal decision
  • Theorem 4.1
  • proof : Proof of Lemma \ref{['lem:bayes-terminal-decision']}
  • Lemma B.1
  • proof : Proof of Lemma \ref{['lem:concave']}
  • proof : Proof of Theorem \ref{['thm:optimal']}
  • Proposition B.2: Gaussian-probit approximation with $1/\sqrt n$ rate
  • proof : Proof of Proposition \ref{['prop:gaussian-approx-rate']}
  • Remark B.3
  • Lemma B.4: Concavity of $g_{a,b}$ for $k\geqslant k_{\min}$
  • ...and 12 more