PETS: A Principled Framework Towards Optimal Trajectory Allocation for Efficient Test-Time Self-Consistency

Zhangyi Liu; Huaizhi Qu; Xiaowei Yin; He Sun; Yanjun Han; Tianlong Chen; Zhun Deng

PETS: A Principled Framework Towards Optimal Trajectory Allocation for Efficient Test-Time Self-Consistency

Zhangyi Liu, Huaizhi Qu, Xiaowei Yin, He Sun, Yanjun Han, Tianlong Chen, Zhun Deng

TL;DR

This work introduces PETS (Principled and Efficient Test-TimeSelf-Consistency), which initiates a principled study of trajectory allocation through an optimization framework and proposes a novel method inspired by the offline framework that adapts budgets to question difficulty while preserving strong theoretical guarantees and computational efficiency.

Abstract

Test-time scaling can improve model performance by aggregating stochastic reasoning trajectories. However, achieving sample-efficient test-time self-consistency under a limited budget remains an open challenge. We introduce PETS (Principled and Efficient Test-TimeSelf-Consistency), which initiates a principled study of trajectory allocation through an optimization framework. Central to our approach is the self-consistency rate, a new measure defined as agreement with the infinite-budget majority vote. This formulation makes sample-efficient test-time allocation theoretically grounded and amenable to rigorous analysis. We study both offline and online settings. In the offline regime, where all questions are known in advance, we connect trajectory allocation to crowdsourcing, a classic and well-developed area, by modeling reasoning traces as workers. This perspective allows us to leverage rich existing theory, yielding theoretical guarantees and an efficient majority-voting-based allocation algorithm. In the online streaming regime, where questions arrive sequentially and allocations must be made on the fly, we propose a novel method inspired by the offline framework. Our approach adapts budgets to question difficulty while preserving strong theoretical guarantees and computational efficiency. Experiments show that PETS consistently outperforms uniform allocation. On GPQA, PETS achieves perfect self-consistency in both settings while reducing the sampling budget by up to 75% (offline) and 55% (online) relative to uniform allocation. Code is available at https://github.com/ZDCSlab/PETS.

PETS: A Principled Framework Towards Optimal Trajectory Allocation for Efficient Test-Time Self-Consistency

TL;DR

Abstract

Paper Structure (47 sections, 10 theorems, 160 equations, 17 figures, 3 tables, 2 algorithms)

This paper contains 47 sections, 10 theorems, 160 equations, 17 figures, 3 tables, 2 algorithms.

Introduction
Setup
Offline PETS in the Batch Setting
Modeling as an MDP process.
Approximation of dynamic programming.
Online PETS in the Streaming Setting
Execution Protocol
Optimal Budget Allocation
Connection with the Offline Case
Experiment
PETS for Offline Budget Allocation
PETS for Online Budget Allocation
Related Work
Test-Time Scaling.
Efficient Reasoning.
...and 32 more sections

Key Result

Lemma 3.1

Given the terminal belief $S^H$, the Bayes-optimal decision for each question is therefore

Figures (17)

Figure 1: In this paper, we study how to allocate an LLM’s sampling budget across questions to best match the full-budget outcome under self-consistency. Our results show that PETS substantially reduces the required budget while maintaining accuracy.
Figure 2: Budget allocation plan of the offline and online settings on 9 simulated binary choice questions, $\mathcal{Y}=\{1,2\}$. Each question is associated with a $\theta=\max(\theta_1,\theta_2)$, and larger theta indicates easier questions.
Figure 3: Budget allocation curve in the offline setting. "(conf)" denotes the trace confidence-weighted variant. Consistency is computed within each matched-variant comparison group: PETS-Offline vs. Uniform, and PETS-Offline (conf) vs. Uniform (conf).
Figure 4: Budget allocation curve in the online setting. "(conf)" denotes the trace confidence-weighted variant. Consistency is computed within each matched-variant comparison group: PETS-Online vs. Uniform, and PETS-Online (conf) vs. Uniform (conf). Oracle variant assumes access to the latent parameter $\bm\theta$, while in the online setting, $\bm\theta$ is learnt from a training dataset.
Figure 5: Probability that multinomial majority voting selects the true best option as a function of budget. We plot $\mathbb{P}\!\left(Y^{\mathrm{Maj}}(B)=\arg\max_{y\in\mathcal{Y}}\theta_y\right)$ versus $\textsc{Budget}\in\{1,\ldots,64\}$ for different ground-truth preference vectors $\bm{\theta}=(\theta_1,\ldots,\theta_M)$. Gray curves (Exact) are computed from the true $\bm{\theta}$ (exact for $M=2,4$; Monte Carlo estimates for $M=10$), while green dotted curves (Probit) are produced by fitting a two-parameter probit model and evaluating the fitted model across budgets. The fitted probit curves closely track the exact/MC curves in all regimes. Panels correspond to $M\in\{2,4,10\}$. For $M=4$, we use: dominate$[0.8,0.1,0.1,0]$, head-heavy$[0.6,0.3,0.05,0.05]$, linear$[0.4,0.3,0.2,0.1]$, flat$[0.3,0.25,0.25,0.2]$, and uniform$[0.25,0.25,0.25,0.25]$. For $M=10$, we use: dominate$[0.8,0.15,0.05,0,\ldots,0]$, head-heavy$[0.25,0.15,0.10,0.0714,\ldots,0.0714]$, linear$[10,9,\ldots,1]/55$, flat$\mathrm{normalize}([10,9,\ldots,1]^{0.4})$, and uniform$[0.1,\ldots,0.1]$.
...and 12 more figures

Theorems & Definitions (22)

Lemma 3.1: Bayes-optimal terminal decision
Theorem 4.1
proof : Proof of Lemma \ref{['lem:bayes-terminal-decision']}
Lemma B.1
proof : Proof of Lemma \ref{['lem:concave']}
proof : Proof of Theorem \ref{['thm:optimal']}
Proposition B.2: Gaussian-probit approximation with $1/\sqrt n$ rate
proof : Proof of Proposition \ref{['prop:gaussian-approx-rate']}
Remark B.3
Lemma B.4: Concavity of $g_{a,b}$ for $k\geqslant k_{\min}$
...and 12 more

PETS: A Principled Framework Towards Optimal Trajectory Allocation for Efficient Test-Time Self-Consistency

TL;DR

Abstract

PETS: A Principled Framework Towards Optimal Trajectory Allocation for Efficient Test-Time Self-Consistency

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (17)

Theorems & Definitions (22)