Table of Contents
Fetching ...

Asymptotically Optimal Sequential Testing with Heterogeneous LLMs

Guokai Li, Alys Liang, Mo Liu, Murray Lei, Stefanus Jasin, Fenghua Yang, Preet Baxi

Abstract

We study a Bayesian binary sequential hypothesis testing problem with multiple large language models (LLMs). Each LLM $j$ has per-query cost $c_j>0$, random waiting time with mean $μ_j>0$ and sub-Gaussian tails, and \emph{asymmetric} accuracies: the probability of returning the correct label depends on the true hypothesis $θ\in\{A,B\}$ and needs not be the same under $A$ and $B$. This asymmetry induces two distinct information rates $(I_{j,A}, I_{j,B})$ per LLM, one under each hypothesis. The decision-maker chooses LLMs sequentially, observes their noisy binary answers, and stops when the posterior probability of one hypothesis exceeds $1-α$. The objective is to minimize the sum of expected query cost and expected waiting cost, $\mathbb{E}[C_π] + \mathbb{E}[g(W_π)]$, where $C_π$ is the total query cost, $W_π$ is the total waiting time and $g$ is a polynomial function (e.g., $g(x)=x^ρ$ with $ρ\ge 1$). We prove that as the error tolerance $α\to0$, the optimal policy is asymptotically equivalent to one that uses at most two LLMs. In this case, a single-LLM policy is \emph{not} generically optimal: optimality now requires exploiting a two-dimensional tradeoff between information under $A$ and information under $B$. Any admissible policy induces an expected information-allocation vector in $\mathbb{R}_+^2$, and we show that the optimal allocation lies at an extreme point of the associated convex set when $α$ is relatively small, and hence uses at most two LLMs. We construct belief-dependent policies that first mix between two LLMs when the posterior is ambiguous, and then switch to a single "specialist" LLM when the posterior is sufficiently close to one of the hypotheses. These policies match the universal lower bound up to a $(1+o(1))$ factor as $α\rightarrow 0$.

Asymptotically Optimal Sequential Testing with Heterogeneous LLMs

Abstract

We study a Bayesian binary sequential hypothesis testing problem with multiple large language models (LLMs). Each LLM has per-query cost , random waiting time with mean and sub-Gaussian tails, and \emph{asymmetric} accuracies: the probability of returning the correct label depends on the true hypothesis and needs not be the same under and . This asymmetry induces two distinct information rates per LLM, one under each hypothesis. The decision-maker chooses LLMs sequentially, observes their noisy binary answers, and stops when the posterior probability of one hypothesis exceeds . The objective is to minimize the sum of expected query cost and expected waiting cost, , where is the total query cost, is the total waiting time and is a polynomial function (e.g., with ). We prove that as the error tolerance , the optimal policy is asymptotically equivalent to one that uses at most two LLMs. In this case, a single-LLM policy is \emph{not} generically optimal: optimality now requires exploiting a two-dimensional tradeoff between information under and information under . Any admissible policy induces an expected information-allocation vector in , and we show that the optimal allocation lies at an extreme point of the associated convex set when is relatively small, and hence uses at most two LLMs. We construct belief-dependent policies that first mix between two LLMs when the posterior is ambiguous, and then switch to a single "specialist" LLM when the posterior is sufficiently close to one of the hypotheses. These policies match the universal lower bound up to a factor as .

Paper Structure

This paper contains 39 sections, 22 theorems, 274 equations, 1 figure, 1 table.

Key Result

Lemma 1

Suppose the output realization only depends on the chosen LLM and the ground truth $\theta$, then for each time $t=1,2,...$, we have Moreover, the stopping time and the decision rule correspond to an equivalent form bearing a threshold structure where $A_\alpha := \log\frac{1-\alpha}{\alpha} - \delta$ and $B_\alpha := \log\frac{1-\alpha}{\alpha} + \delta$. $\blacktriangleleft$$\blacktriangleleft

Figures (1)

  • Figure :

Theorems & Definitions (36)

  • Lemma 1
  • Definition 1: Admissible Policy
  • Remark 1
  • Remark 2: No monotonicity assumptions
  • Definition 2: Sign-based two-LLM Sequential Policy
  • Theorem 1: Asymptotic optimality and tightness of the convergence rate
  • Lemma 2
  • Lemma 3
  • Proposition 1
  • Lemma 4: Tightness of information constraints
  • ...and 26 more