Table of Contents
Fetching ...

PromptWise: Online Learning for Cost-Aware Prompt Assignment in Generative Models

Xiaoyan Hu, Lauren Pick, Ho-fung Leung, Farzan Farnia

TL;DR

PromptWise addresses the problem of cost-aware prompt-to-model assignment for generative models by modeling it as a Cost-Aware Contextual Multi-Armed Bandit (CA-CMAB) that supports multiple assignments per prompt and explicit accounting of service costs. It introduces a UCB-based algorithm (PromptWise) that estimates prompt-model compatibility, prioritizes cheaper models, and escalates to costlier models only when needed, with a kernel-augmented variant (PromptWise-KLR) for non-linear prediction. The approach achieves performance comparable to cost-unaware baselines while substantially reducing total cost, as demonstrated across diverse tasks such as Sudoku, chess puzzles, code generation/translation, and synthetic text-to-image generation. This work provides a practical, scalable framework for budget-conscious deployment of multiple generative models in real-world prompts, offering adaptability to new models and prompts and a foundation for extensions to additional modalities and reward-model integrations.

Abstract

The rapid advancement of generative AI has provided users with a wide range of well-trained models to address diverse prompts. When selecting a model for a given prompt, users should weigh not only its performance but also its service cost. However, existing model-selection methods typically emphasize performance while overlooking cost differences. In this paper, we introduce PromptWise, an online learning framework that assigns prompts to generative models in a cost-aware manner. PromptWise estimates prompt-model compatibility to select the least expensive model expected to deliver satisfactory outputs. Unlike standard contextual bandits that make a one-shot decision per prompt, PromptWise employs a cost-aware bandit structure that allows sequential model assignments per prompt to reduce total service cost. Through numerical experiments on tasks such as code generation and translation, we demonstrate that PromptWise can achieve performance comparable to baseline selection methods while incurring substantially lower costs. The code is available at: github.com/yannxiaoyanhu/PromptWise.

PromptWise: Online Learning for Cost-Aware Prompt Assignment in Generative Models

TL;DR

PromptWise addresses the problem of cost-aware prompt-to-model assignment for generative models by modeling it as a Cost-Aware Contextual Multi-Armed Bandit (CA-CMAB) that supports multiple assignments per prompt and explicit accounting of service costs. It introduces a UCB-based algorithm (PromptWise) that estimates prompt-model compatibility, prioritizes cheaper models, and escalates to costlier models only when needed, with a kernel-augmented variant (PromptWise-KLR) for non-linear prediction. The approach achieves performance comparable to cost-unaware baselines while substantially reducing total cost, as demonstrated across diverse tasks such as Sudoku, chess puzzles, code generation/translation, and synthetic text-to-image generation. This work provides a practical, scalable framework for budget-conscious deployment of multiple generative models in real-world prompts, offering adaptability to new models and prompts and a foundation for extensions to additional modalities and reward-model integrations.

Abstract

The rapid advancement of generative AI has provided users with a wide range of well-trained models to address diverse prompts. When selecting a model for a given prompt, users should weigh not only its performance but also its service cost. However, existing model-selection methods typically emphasize performance while overlooking cost differences. In this paper, we introduce PromptWise, an online learning framework that assigns prompts to generative models in a cost-aware manner. PromptWise estimates prompt-model compatibility to select the least expensive model expected to deliver satisfactory outputs. Unlike standard contextual bandits that make a one-shot decision per prompt, PromptWise employs a cost-aware bandit structure that allows sequential model assignments per prompt to reduce total service cost. Through numerical experiments on tasks such as code generation and translation, we demonstrate that PromptWise can achieve performance comparable to baseline selection methods while incurring substantially lower costs. The code is available at: github.com/yannxiaoyanhu/PromptWise.

Paper Structure

This paper contains 20 sections, 7 theorems, 28 equations, 29 figures, 2 tables, 4 algorithms.

Key Result

Proposition 1

Under Assumption aspt:reward and with unlimited round budget, an optimal policy (obj) is as follows: for any incoming context $x \in {\mathcal{X}}$, it takes action $a^\star(x)$ if a reward of 1.0 is not observed, which is given by where $q_a(x)$ and $c_a$ are the success probability conditioned to context $x$ and the (fixed) cost for any arm $a \in {\mathcal{A}}$, respectively. The expected tota

Figures (29)

  • Figure 1: Illustration of cost-aware prompt assignment in Sudoku solving, where a single prompt may be routed to multiple models. PromptWise, via online learning, would route easier puzzles to the inexpensive GPT-4o-mini, while for harder instances it may escalate to stronger and more expensive available models GPT-4o and o1.
  • Figure 2: Interaction protocol of standard contextual bandits (top) versus our cost-aware contextual multi-armed bandit (CA-CMAB) framework in PromptWise (bottom). Unlike the standard setting, CA-CMAB enables multiple model assignments per prompt and explicitly balances accuracy (reward) with cumulative service cost, better reflecting practical usage scenarios.
  • Figure 3: Code Completion on HumanEval (Task 2): Gemini-2.5-Flash-preview, Deepseek-Chat, Qwen-Plus, GPT-4o, and Claude-Opus-4. Results are averaged over 20.0 trials.
  • Figure 4: Code Translation on HumanEval-X Benchmark (Task 3): Gemini-2.5-Flash-preview, Deepseek-Chat, Qwen-Plus, GPT-4o, and Claude-Opus-4. The LLM is provided with C++ (or Java) code and is asked to translate it into Java (or C++). Results are averaged over 20 trials.
  • Figure 5: Trade-offs between cost and success rate of assigning prompts to models Claude-Sonnet-4 ($18 PMT) and Claude-Opus-4 ($75 PMT): We test the algorithms with different round budgets $\tau_{\max}=1,2,4,8,16$ (the darker color correspond to greater round budgets). The proposed PromptWise method could yield comparable success rates while incurring lower costs, achieving better trade-offs between model costs and performance. Note that for the Greedy and Random baselines, the averaged cost and success rate remain constant across different round budget values.
  • ...and 24 more figures

Theorems & Definitions (13)

  • Remark 1
  • Remark 2: Comparison to contextual bandits
  • Proposition 1: Oracle
  • proof : Proof of Proposition \ref{['thm-oracle']}
  • Theorem 1: Regret of Algorithm \ref{['alg:ucb-var']}
  • proof
  • proof
  • Lemma 1: pmlr-v70-li17c
  • Lemma 2
  • Lemma 3: pmlr-v70-li17c
  • ...and 3 more