Table of Contents
Fetching ...

PAC Reasoning: Controlling the Performance Loss for Efficient Reasoning

Hao Zeng, Jianguo Huang, Bingyi Jing, Hongxin Wei, Bo An

TL;DR

This work tackles the cost of long reasoning in large reasoning models by introducing PAC reasoning, a method that guarantees the performance loss from switching to a cheaper nonthinking mode remains within a user-specified bound $\varepsilon$ with confidence $1-\alpha$. It builds a composite model $\hat{f}$ that selects between thinking ($f$) and nonthinking ($\tilde{f}$) outputs based on an uncertainty score, calibrating a threshold $\hat{u}$ via an upper confidence bound on cumulative error $L(u)$ computed on a calibration set. The UCB is obtained through importance sampling and either CLT-based or Hoeffding-based bounds, enabling distribution-free guarantees under mild assumptions; the monotonicity of the loss in $u$ is key to deriving valid thresholds. Empirical results on MATH-500, ZebraLogic, and Arena-Hard show that PAC reasoning can reduce inference cost by substantial token savings while keeping the loss within the target bound, with logits-based uncertainty offering more stable performance control than verbalized uncertainty. The approach is model-agnostic and provides a principled framework for efficient reasoning with statistical guarantees, contributing to safer and more scalable deployment of LRMs in high-stakes tasks.

Abstract

Large reasoning models (LRMs) have achieved remarkable progress in complex problem-solving tasks. Despite this success, LRMs typically suffer from high computational costs during deployment, highlighting a need for efficient inference. A popular direction of efficiency improvement is to switch the LRM between thinking and nonthinking modes dynamically. However, such approaches often introduce additional reasoning errors and lack statistical guarantees for the performance loss, which are critical for high-stakes applications. In this work, we propose Probably Approximately Correct (PAC) reasoning that controls the performance loss under the user-specified performance loss tolerance. In particular, we construct an upper confidence bound on the performance loss, formulated as a monotone function of the uncertainty score, and subsequently determine a threshold for switching to the nonthinking model. Theoretically, using the threshold to switch between the thinking and nonthinking modes ensures bounded performance loss in a distribution-free manner. Our comprehensive experiments on reasoning benchmarks show that the proposed method can save computational budgets and control the user-specified performance loss.

PAC Reasoning: Controlling the Performance Loss for Efficient Reasoning

TL;DR

This work tackles the cost of long reasoning in large reasoning models by introducing PAC reasoning, a method that guarantees the performance loss from switching to a cheaper nonthinking mode remains within a user-specified bound with confidence . It builds a composite model that selects between thinking () and nonthinking () outputs based on an uncertainty score, calibrating a threshold via an upper confidence bound on cumulative error computed on a calibration set. The UCB is obtained through importance sampling and either CLT-based or Hoeffding-based bounds, enabling distribution-free guarantees under mild assumptions; the monotonicity of the loss in is key to deriving valid thresholds. Empirical results on MATH-500, ZebraLogic, and Arena-Hard show that PAC reasoning can reduce inference cost by substantial token savings while keeping the loss within the target bound, with logits-based uncertainty offering more stable performance control than verbalized uncertainty. The approach is model-agnostic and provides a principled framework for efficient reasoning with statistical guarantees, contributing to safer and more scalable deployment of LRMs in high-stakes tasks.

Abstract

Large reasoning models (LRMs) have achieved remarkable progress in complex problem-solving tasks. Despite this success, LRMs typically suffer from high computational costs during deployment, highlighting a need for efficient inference. A popular direction of efficiency improvement is to switch the LRM between thinking and nonthinking modes dynamically. However, such approaches often introduce additional reasoning errors and lack statistical guarantees for the performance loss, which are critical for high-stakes applications. In this work, we propose Probably Approximately Correct (PAC) reasoning that controls the performance loss under the user-specified performance loss tolerance. In particular, we construct an upper confidence bound on the performance loss, formulated as a monotone function of the uncertainty score, and subsequently determine a threshold for switching to the nonthinking model. Theoretically, using the threshold to switch between the thinking and nonthinking modes ensures bounded performance loss in a distribution-free manner. Our comprehensive experiments on reasoning benchmarks show that the proposed method can save computational budgets and control the user-specified performance loss.

Paper Structure

This paper contains 41 sections, 5 theorems, 44 equations, 2 figures, 3 tables, 4 algorithms.

Key Result

Theorem 4

Let $\hat{u}$ be the threshold selected by the PAC reasoning algorithm (Algorithm alg:pac_reasoning). If calibration set and test set are i.i.d. and Assumption assump:validity holds, then the composite model $\hat{f}$ constructed by Algorithm alg:pac_reasoning satisfies the $(\epsilon, \alpha)$-PAC

Figures (2)

  • Figure 1: Error control and saved token percentage (STP) of PAC reasoning, with binary loss on ZebraLogic at a confidence level $95\%$. "PAC(Logits)" and "PAC(Verbalized)" present PAC reasoning using the logits-based score and the verbalized score. Both control the performance loss under the target $0.08$ and save at least $20\%$. All experiments are repeated 100 times, and other details follow as the main experiment in \ref{['sec:exp_setup']} and Appendix \ref{['sec:experimental_details']}.
  • Figure 2: Error control, ECP and STP of PAC reasoning for semantic loss across three benchmarks at a confidence level of $\alpha = 0.05$. Uncertainty score includes the logits-based score and the verbalized score. All experiments are repeated 100 times, and the shaded areas represent standard deviations.

Theorems & Definitions (15)

  • Definition 1: ($\epsilon, \alpha$)-PAC efficient
  • Remark 2
  • Remark 3
  • Theorem 4: PAC guarantee
  • Theorem 5: Empirical risk PAC guarantee
  • Remark 6
  • proof
  • Proposition 7: Asymptotic validity of UCB baed on CLT
  • proof
  • Lemma 8: Conditional Hoeffding bound
  • ...and 5 more