Table of Contents
Fetching ...

Multi-LLM Query Optimization

Arlen Dean, Zijin Zhang, Stefanus Jasin, Yuqing Liu

Abstract

Deploying multiple large language models (LLMs) in parallel to classify an unknown ground-truth label is a common practice, yet the problem of optimally allocating queries across heterogeneous models remains poorly understood. In this paper, we formulate a robust, offline query-planning problem that minimizes total query cost subject to statewise error constraints which guarantee reliability for every possible ground-truth label. We first establish that this problem is NP-hard via a reduction from the minimum-weight set cover problem. To overcome this intractability, we develop a surrogate by combining a union bound decomposition of the multi-class error into pairwise comparisons with Chernoff-type concentration bounds. The resulting surrogate admits a closed-form, multiplicatively separable expression in the query counts and is guaranteed to be feasibility-preserving. We further show that the surrogate is asymptotically tight at the optimization level: the ratio of surrogate-optimal cost to true optimal cost converges to one as error tolerances shrink, with an explicit rate of $O\left(\log\log(1/α_{\min}) / \log(1/α_{\min})\right)$. Finally, we design an asymptotic fully polynomial-time approximation scheme (AFPTAS) that returns a surrogate-feasible query plan within a $(1+\varepsilon)$ factor of the surrogate optimum.

Multi-LLM Query Optimization

Abstract

Deploying multiple large language models (LLMs) in parallel to classify an unknown ground-truth label is a common practice, yet the problem of optimally allocating queries across heterogeneous models remains poorly understood. In this paper, we formulate a robust, offline query-planning problem that minimizes total query cost subject to statewise error constraints which guarantee reliability for every possible ground-truth label. We first establish that this problem is NP-hard via a reduction from the minimum-weight set cover problem. To overcome this intractability, we develop a surrogate by combining a union bound decomposition of the multi-class error into pairwise comparisons with Chernoff-type concentration bounds. The resulting surrogate admits a closed-form, multiplicatively separable expression in the query counts and is guaranteed to be feasibility-preserving. We further show that the surrogate is asymptotically tight at the optimization level: the ratio of surrogate-optimal cost to true optimal cost converges to one as error tolerances shrink, with an explicit rate of . Finally, we design an asymptotic fully polynomial-time approximation scheme (AFPTAS) that returns a surrogate-feasible query plan within a factor of the surrogate optimum.

Paper Structure

This paper contains 26 sections, 25 theorems, 101 equations.

Key Result

Theorem 1

The query design problem eq:opt_problem is NP-hard.

Theorems & Definitions (53)

  • Theorem 1: NP-Hardness
  • Definition 1: Log-likelihood difference
  • Lemma 1: Union bound reduction to pairwise comparisons
  • Definition 2: Chernoff affinity factor
  • Proposition 1: Pairwise Chernoff bound
  • Theorem 2: Statewise Error Upper Bound
  • Definition 3: Optimal costs
  • Corollary 1: Surrogate conservatism
  • Theorem 3: Optimization-level tightness
  • Proposition 2: Reformulation
  • ...and 43 more