PromptWise: Online Learning for Cost-Aware Prompt Assignment in Generative Models

Xiaoyan Hu; Lauren Pick; Ho-fung Leung; Farzan Farnia

PromptWise: Online Learning for Cost-Aware Prompt Assignment in Generative Models

Xiaoyan Hu, Lauren Pick, Ho-fung Leung, Farzan Farnia

TL;DR

PromptWise addresses the problem of cost-aware prompt-to-model assignment for generative models by modeling it as a Cost-Aware Contextual Multi-Armed Bandit (CA-CMAB) that supports multiple assignments per prompt and explicit accounting of service costs. It introduces a UCB-based algorithm (PromptWise) that estimates prompt-model compatibility, prioritizes cheaper models, and escalates to costlier models only when needed, with a kernel-augmented variant (PromptWise-KLR) for non-linear prediction. The approach achieves performance comparable to cost-unaware baselines while substantially reducing total cost, as demonstrated across diverse tasks such as Sudoku, chess puzzles, code generation/translation, and synthetic text-to-image generation. This work provides a practical, scalable framework for budget-conscious deployment of multiple generative models in real-world prompts, offering adaptability to new models and prompts and a foundation for extensions to additional modalities and reward-model integrations.

Abstract

The rapid advancement of generative AI has provided users with a wide range of well-trained models to address diverse prompts. When selecting a model for a given prompt, users should weigh not only its performance but also its service cost. However, existing model-selection methods typically emphasize performance while overlooking cost differences. In this paper, we introduce PromptWise, an online learning framework that assigns prompts to generative models in a cost-aware manner. PromptWise estimates prompt-model compatibility to select the least expensive model expected to deliver satisfactory outputs. Unlike standard contextual bandits that make a one-shot decision per prompt, PromptWise employs a cost-aware bandit structure that allows sequential model assignments per prompt to reduce total service cost. Through numerical experiments on tasks such as code generation and translation, we demonstrate that PromptWise can achieve performance comparable to baseline selection methods while incurring substantially lower costs. The code is available at: github.com/yannxiaoyanhu/PromptWise.

PromptWise: Online Learning for Cost-Aware Prompt Assignment in Generative Models

TL;DR

Abstract

PromptWise: Online Learning for Cost-Aware Prompt Assignment in Generative Models

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (29)

Theorems & Definitions (13)