Table of Contents
Fetching ...

When to Trust the Cheap Check: Weak and Strong Verification for Reasoning

Shayan Kiyani, Sima Noorani, George Pappas, Hamed Hassani

TL;DR

The paper addresses reliable reasoning with LLMs by orchestrating fast, cheap weak verification with selective, costly strong verification. It formalizes weak–strong verification policies, introduces calibration and sharpness as key weak-verifier properties, and proves that optimal policies have a two-threshold structure. It then presents Selective Strong Verification (SSV), an online, distribution-free algorithm that maintains target type-I and type-II error rates while controlling strong-verification usage, with guarantees in finite samples. Empirical results on outcome-level math and stepwise Sudoku demonstrate principled navigation of the accuracy–cost frontier, achieving near-oracle performance with substantially fewer strong-verifier calls. The framework offers scalable, trustworthy reasoning for high-stakes domains by allocating verification resources where they are most needed, under explicit error constraints.

Abstract

Reasoning with LLMs increasingly unfolds inside a broader verification loop. Internally, systems use cheap checks, such as self-consistency or proxy rewards, which we call weak verification. Externally, users inspect outputs and steer the model through feedback until results are trustworthy, which we call strong verification. These signals differ sharply in cost and reliability: strong verification can establish trust but is resource-intensive, while weak verification is fast and scalable but noisy and imperfect. We formalize this tension through weak--strong verification policies, which decide when to accept or reject based on weak verification and when to defer to strong verification. We introduce metrics capturing incorrect acceptance, incorrect rejection, and strong-verification frequency. Over population, we show that optimal policies admit a two-threshold structure and that calibration and sharpness govern the value of weak verifiers. Building on this, we develop an online algorithm that provably controls acceptance and rejection errors without assumptions on the query stream, the language model, or the weak verifier.

When to Trust the Cheap Check: Weak and Strong Verification for Reasoning

TL;DR

The paper addresses reliable reasoning with LLMs by orchestrating fast, cheap weak verification with selective, costly strong verification. It formalizes weak–strong verification policies, introduces calibration and sharpness as key weak-verifier properties, and proves that optimal policies have a two-threshold structure. It then presents Selective Strong Verification (SSV), an online, distribution-free algorithm that maintains target type-I and type-II error rates while controlling strong-verification usage, with guarantees in finite samples. Empirical results on outcome-level math and stepwise Sudoku demonstrate principled navigation of the accuracy–cost frontier, achieving near-oracle performance with substantially fewer strong-verifier calls. The framework offers scalable, trustworthy reasoning for high-stakes domains by allocating verification resources where they are most needed, under explicit error constraints.

Abstract

Reasoning with LLMs increasingly unfolds inside a broader verification loop. Internally, systems use cheap checks, such as self-consistency or proxy rewards, which we call weak verification. Externally, users inspect outputs and steer the model through feedback until results are trustworthy, which we call strong verification. These signals differ sharply in cost and reliability: strong verification can establish trust but is resource-intensive, while weak verification is fast and scalable but noisy and imperfect. We formalize this tension through weak--strong verification policies, which decide when to accept or reject based on weak verification and when to defer to strong verification. We introduce metrics capturing incorrect acceptance, incorrect rejection, and strong-verification frequency. Over population, we show that optimal policies admit a two-threshold structure and that calibration and sharpness govern the value of weak verifiers. Building on this, we develop an online algorithm that provably controls acceptance and rejection errors without assumptions on the query stream, the language model, or the weak verifier.
Paper Structure (33 sections, 3 theorems, 63 equations, 14 figures, 8 tables, 1 algorithm)

This paper contains 33 sections, 3 theorems, 63 equations, 14 figures, 8 tables, 1 algorithm.

Key Result

Theorem 4.2

Suppose Assumption ass:calibration holds. For any $\lambda_1,\lambda_2 \ge 0$, there exists an optimal policy $\pi^\star(\lambda_1,\lambda_2)$ that has a threshold structure: there exist thresholds $t_{\mathrm{low}}, t_{\mathrm{high}}\in[0,1]$ such that

Figures (14)

  • Figure 1: The architecture of weak-strong verification for LLM reasoning.
  • Figure 2: Empirical Error Rate Convergence. Running-average error rates $\frac{1}{T}\sum_{t=1}^T \mathrm{err}_t$ for target levels $\alpha=\beta=0.15$. Left and Center (MATH): Convergence of Type-I and Type-II errors across three difficulty levels in the outcome level verification task. Right (Sudoku): Convergence in the sequential step-by-step reasoning task.
  • Figure 3: Reasoning Accuracy vs. Verification Cost Tradeoffs. SSV (Algorithm \ref{['alg:adaptive']}) (Adaptive: solid colored lines) interpolates between the Strong-Only Oracle (black star) and the Weak-Only baselines (colored circles). Left (MATH): Tradeoff curves for Easy, Medium, and Hard problems; Right (Sudoku): Tradeoff curve for Sudoku step-by-step reasoning task. points are labeled with nominal error targets where $\alpha = \beta$.
  • Figure 4: Best-of-$n$ (MATH): error convergence for $\alpha=\beta=0.05$.
  • Figure 5: Best-of-$n$ (MATH): error convergence for $\alpha=\beta=0.10$.
  • ...and 9 more figures

Theorems & Definitions (15)

  • Theorem 4.2: Structure of an optimal policy
  • Proposition 4.3: Value of the optimal objective
  • Theorem 5.1: Finite-time empirical error control
  • proof
  • Claim A.1: Unconditional and pointwise form
  • proof
  • proof
  • Claim A.2: Unbiased importance weighting
  • proof
  • Claim A.3: Telescoping inequalities from the threshold updates
  • ...and 5 more