Asymptotically Optimal Sequential Testing with Heterogeneous LLMs

Guokai Li; Alys Liang; Mo Liu; Murray Lei; Stefanus Jasin; Fenghua Yang; Preet Baxi

Asymptotically Optimal Sequential Testing with Heterogeneous LLMs

Guokai Li, Alys Liang, Mo Liu, Murray Lei, Stefanus Jasin, Fenghua Yang, Preet Baxi

Abstract

We study a Bayesian binary sequential hypothesis testing problem with multiple large language models (LLMs). Each LLM $j$ has per-query cost $c_j>0$, random waiting time with mean $μ_j>0$ and sub-Gaussian tails, and \emph{asymmetric} accuracies: the probability of returning the correct label depends on the true hypothesis $θ\in\{A,B\}$ and needs not be the same under $A$ and $B$. This asymmetry induces two distinct information rates $(I_{j,A}, I_{j,B})$ per LLM, one under each hypothesis. The decision-maker chooses LLMs sequentially, observes their noisy binary answers, and stops when the posterior probability of one hypothesis exceeds $1-α$. The objective is to minimize the sum of expected query cost and expected waiting cost, $\mathbb{E}[C_π] + \mathbb{E}[g(W_π)]$, where $C_π$ is the total query cost, $W_π$ is the total waiting time and $g$ is a polynomial function (e.g., $g(x)=x^ρ$ with $ρ\ge 1$). We prove that as the error tolerance $α\to0$, the optimal policy is asymptotically equivalent to one that uses at most two LLMs. In this case, a single-LLM policy is \emph{not} generically optimal: optimality now requires exploiting a two-dimensional tradeoff between information under $A$ and information under $B$. Any admissible policy induces an expected information-allocation vector in $\mathbb{R}_+^2$, and we show that the optimal allocation lies at an extreme point of the associated convex set when $α$ is relatively small, and hence uses at most two LLMs. We construct belief-dependent policies that first mix between two LLMs when the posterior is ambiguous, and then switch to a single "specialist" LLM when the posterior is sufficiently close to one of the hypotheses. These policies match the universal lower bound up to a $(1+o(1))$ factor as $α\rightarrow 0$.

Asymptotically Optimal Sequential Testing with Heterogeneous LLMs

Abstract

We study a Bayesian binary sequential hypothesis testing problem with multiple large language models (LLMs). Each LLM

has per-query cost

, random waiting time with mean

and sub-Gaussian tails, and \emph{asymmetric} accuracies: the probability of returning the correct label depends on the true hypothesis

and needs not be the same under

and

. This asymmetry induces two distinct information rates

per LLM, one under each hypothesis. The decision-maker chooses LLMs sequentially, observes their noisy binary answers, and stops when the posterior probability of one hypothesis exceeds

. The objective is to minimize the sum of expected query cost and expected waiting cost,

, where

is the total query cost,

is the total waiting time and

is a polynomial function (e.g.,

with

). We prove that as the error tolerance

, the optimal policy is asymptotically equivalent to one that uses at most two LLMs. In this case, a single-LLM policy is \emph{not} generically optimal: optimality now requires exploiting a two-dimensional tradeoff between information under

and information under

. Any admissible policy induces an expected information-allocation vector in

, and we show that the optimal allocation lies at an extreme point of the associated convex set when

is relatively small, and hence uses at most two LLMs. We construct belief-dependent policies that first mix between two LLMs when the posterior is ambiguous, and then switch to a single "specialist" LLM when the posterior is sufficiently close to one of the hypotheses. These policies match the universal lower bound up to a

factor as

Asymptotically Optimal Sequential Testing with Heterogeneous LLMs

Abstract

Asymptotically Optimal Sequential Testing with Heterogeneous LLMs

Abstract

Paper Structure

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (36)