Table of Contents
Fetching ...

Optimal Bayesian Stopping for Efficient Inference of Consistent LLM Answers

Jingkai Huang, Will Ma, Zhengyuan Zhou

TL;DR

This work develops a Bayesian stopping framework for efficiently inferring the most consistent LLM answer by leveraging prior information. It introduces an exact posterior for mode identification and a computationally tractable $L$-aggregated posterior, proving that $L=3$ suffices to achieve asymptotic optimality while dramatically reducing posterior-computation costs from factorial to polynomial in $K$ via $O(K^L)$. The authors extend the approach to uncertain priors using a hyper-prior over a candidate set and show that prior information improves stopping efficiency even under mis-specified or unknown priors. Empirical results on synthetic data and FEval-TTC demonstrate substantial reductions in required LLM calls (up to ~50%) with maintained or improved accuracy, highlighting the method’s potential for scalable, high-reliability inference in math and reasoning tasks.

Abstract

A simple strategy for improving LLM accuracy, especially in math and reasoning problems, is to sample multiple responses and submit the answer most consistently reached. In this paper we leverage Bayesian prior information to save on sampling costs, stopping once sufficient consistency is reached. Although the exact posterior is computationally intractable, we further introduce an efficient "L-aggregated" stopping policy that tracks only the L-1 most frequent answer counts. Theoretically, we prove that L=3 is all you need: this coarse approximation is sufficient to achieve asymptotic optimality, and strictly dominates prior-free baselines, while having a fast posterior computation. Empirically, this identifies the most consistent (i.e., mode) LLM answer using fewer samples, and can achieve similar answer accuracy while cutting the number of LLM calls (i.e., saving on LLM inference costs) by up to 50%.

Optimal Bayesian Stopping for Efficient Inference of Consistent LLM Answers

TL;DR

This work develops a Bayesian stopping framework for efficiently inferring the most consistent LLM answer by leveraging prior information. It introduces an exact posterior for mode identification and a computationally tractable -aggregated posterior, proving that suffices to achieve asymptotic optimality while dramatically reducing posterior-computation costs from factorial to polynomial in via . The authors extend the approach to uncertain priors using a hyper-prior over a candidate set and show that prior information improves stopping efficiency even under mis-specified or unknown priors. Empirical results on synthetic data and FEval-TTC demonstrate substantial reductions in required LLM calls (up to ~50%) with maintained or improved accuracy, highlighting the method’s potential for scalable, high-reliability inference in math and reasoning tasks.

Abstract

A simple strategy for improving LLM accuracy, especially in math and reasoning problems, is to sample multiple responses and submit the answer most consistently reached. In this paper we leverage Bayesian prior information to save on sampling costs, stopping once sufficient consistency is reached. Although the exact posterior is computationally intractable, we further introduce an efficient "L-aggregated" stopping policy that tracks only the L-1 most frequent answer counts. Theoretically, we prove that L=3 is all you need: this coarse approximation is sufficient to achieve asymptotic optimality, and strictly dominates prior-free baselines, while having a fast posterior computation. Empirically, this identifies the most consistent (i.e., mode) LLM answer using fewer samples, and can achieve similar answer accuracy while cutting the number of LLM calls (i.e., saving on LLM inference costs) by up to 50%.
Paper Structure (28 sections, 2 theorems, 43 equations, 7 tables, 3 algorithms)

This paper contains 28 sections, 2 theorems, 43 equations, 7 tables, 3 algorithms.

Key Result

Theorem 3.1

The expected stopping time $\mathbb{E}[n^{\star,L}]$ satisfies

Theorems & Definitions (15)

  • Example 1
  • Remark 1
  • Example 2
  • Remark 2
  • Theorem 3.1: Asymptotic Stopping Time under $L$-Aggregated Posterior Approximation Scheme
  • Remark 3
  • Remark 4
  • Theorem 4.1: Asymptotic Stopping Time with Uncertain Prior
  • Remark 5
  • Remark 6
  • ...and 5 more