Table of Contents
Fetching ...

Truthful Reverse Auctions for Adaptive Selection via Contextual Multi-Armed Bandits

Pronoy Patra, Sankarshan Damle, Manisha Padala, Sujit Gujar

TL;DR

This work develops a truthful, reverse-contextual multi-armed bandit framework for adaptive LLM model selection where a user elicits private costs from providers. The authors introduce ROSA, a reverse self-resampling procedure, to preserve Bayesian incentive compatibility in reverse auctions, and couple it with TRCM-UCB_OPT, a contextual MAB learner that maintains allocation monotonicity. They prove ex-post truthfulness and no-loss guarantees for the mechanism and establish sublinear regret (with $O(\sqrt{T})$) in stochastic settings. The approach unifies mechanism design with online learning to enable efficient, query-aware, provider-optimal LLM allocation, validated by simulations under Gaussian and Exponential reward models. This framework lays the groundwork for truthful, incentive-compatible marketplace designs in multi-model AI ecosystems.

Abstract

We study the problem of selecting large language models (LLMs) for user queries in settings where multiple LLM providers submit the cost of solving a query. From the users' perspective, choosing an optimal model is a sequential, query-dependent decision problem: high-capacity models offer more reliable outputs but are costlier, while lightweight models are faster and cheaper. We formalize this interaction as a reverse auction design problem with contextual online learning, where the user adaptively discovers which model performs best while eliciting costs from competing LLM providers. Existing multi-armed bandit (MAB) mechanisms focus on forward auctions and social welfare, leaving open the challenges of reverse auctions, provider-optimal outcomes, and contextual adaptation. We address these gaps by designing a resampling-based procedure that generalizes truthful forward MAB mechanisms to reverse auctions and prove that any monotone allocation rule with this procedure is truthful. Using this, we propose a contextual MAB algorithm that learns query-dependent model quality with sublinear regret. Our framework unifies mechanism design and adaptive learning, enabling efficient, truthful, and query-aware LLM selection.

Truthful Reverse Auctions for Adaptive Selection via Contextual Multi-Armed Bandits

TL;DR

This work develops a truthful, reverse-contextual multi-armed bandit framework for adaptive LLM model selection where a user elicits private costs from providers. The authors introduce ROSA, a reverse self-resampling procedure, to preserve Bayesian incentive compatibility in reverse auctions, and couple it with TRCM-UCB_OPT, a contextual MAB learner that maintains allocation monotonicity. They prove ex-post truthfulness and no-loss guarantees for the mechanism and establish sublinear regret (with ) in stochastic settings. The approach unifies mechanism design with online learning to enable efficient, query-aware, provider-optimal LLM allocation, validated by simulations under Gaussian and Exponential reward models. This framework lays the groundwork for truthful, incentive-compatible marketplace designs in multi-model AI ecosystems.

Abstract

We study the problem of selecting large language models (LLMs) for user queries in settings where multiple LLM providers submit the cost of solving a query. From the users' perspective, choosing an optimal model is a sequential, query-dependent decision problem: high-capacity models offer more reliable outputs but are costlier, while lightweight models are faster and cheaper. We formalize this interaction as a reverse auction design problem with contextual online learning, where the user adaptively discovers which model performs best while eliciting costs from competing LLM providers. Existing multi-armed bandit (MAB) mechanisms focus on forward auctions and social welfare, leaving open the challenges of reverse auctions, provider-optimal outcomes, and contextual adaptation. We address these gaps by designing a resampling-based procedure that generalizes truthful forward MAB mechanisms to reverse auctions and prove that any monotone allocation rule with this procedure is truthful. Using this, we propose a contextual MAB algorithm that learns query-dependent model quality with sublinear regret. Our framework unifies mechanism design and adaptive learning, enabling efficient, truthful, and query-aware LLM selection.
Paper Structure (40 sections, 14 theorems, 67 equations, 8 figures, 1 table, 3 algorithms)

This paper contains 40 sections, 14 theorems, 67 equations, 8 figures, 1 table, 3 algorithms.

Key Result

Theorem 1

A mechanism $(\mathcal{A}, \mathcal{P})$ is BIC iff it satisfies the following two conditions:

Figures (8)

  • Figure : (a) Averaged cumulative realized regret with scaled $\sqrt{T}$ reference.
  • Figure : (a) Averaged cumulative realized regret with scaled $\sqrt{T}$ reference.
  • Figure : (a) Averaged cumulative realized regret with scaled $\sqrt{T}$ reference.
  • Figure : (b) Realized regret per round showing convergence toward zero.
  • Figure : (c) Cumulative actual revenue vs. clairvoyant benchmark.
  • ...and 3 more figures

Theorems & Definitions (33)

  • Definition 1: Bayesian Incentive Compatible Nisan_Roughgarden_Tardos_Vazirani_2007
  • Definition 2: Ex Post Individual Rationality (EPIR) Nisan_Roughgarden_Tardos_Vazirani_2007
  • Theorem 1: Myerson’s Characterization Theorem for Reverse Auction myerson1981optimal
  • Theorem 2: Optimal Reverse Auction
  • Definition 3: Ex Post Monotone
  • Definition 4: Reverse Self-Resampling Procedure
  • Proposition 1
  • proof
  • Definition 5: Truthfulness in Expectation (Ex-post Incentive Compatibility, EPIC)
  • Definition 6: Universal Ex-post Individual Rationality (EPIR)
  • ...and 23 more