Inference-Aware Prompt Optimization for Aligning Black-Box Large Language Models
Saaduddin Mahmud, Mason Nakamura, Kyle Hollins Wray, Shlomo Zilberstein
TL;DR
This work addresses the misalignment that arises when prompt optimization for black-box LLMs ignores inference-time strategies. It introduces Iapo, a framework that jointly optimizes prompts and inference scaling under user budgets, cast as a contextual best-arm identification problem; a fixed-budget training algorithm PSST (and a warm-up heuristic) provides finite-budget guarantees. The approach is extended with Top-$K$ screening to boost efficiency in low-budget regimes. Across six diverse tasks, including multi-objective reasoning and summarization, inference-aware optimization consistently improves cost-adjusted performance over inference-agnostic baselines, demonstrating that prompt quality and inference strategy are intrinsically linked. The results highlight practical implications for reliable, budget-conscious alignment of black-box LLMs and point to future directions in richer inference policies and latency-constrained multi-objective deployment.
Abstract
Prompt optimization methods have demonstrated significant effectiveness in aligning black-box large language models (LLMs). In parallel, inference scaling strategies such as Best-of-N Sampling and Majority Voting have likewise been shown to improve alignment and performance by trading additional computation for better output. However, existing prompt optimization approaches are inference strategy agnostic; that is, they optimize prompts without accounting for the inference strategy. This constitutes a significant methodological gap, as our empirical and theoretical analysis reveals a strong interdependence between these two paradigms. Moreover, we find that user preferences regarding trade-offs among multiple objectives and inference budgets substantially influence the choice of prompt and inference configuration. To address this gap, we introduce a novel unified framework named IAPO (Inference-Aware Prompt Optimization) that jointly optimizes the prompt and inference scale, while being aware of the inference budget and different task objectives. We then develop a fixed-budget training algorithm for IAPO, called PSST (Prompt Scaling via Sequential Trimming), and establish finite-budget guarantees on the error probability. Finally, we evaluate the effectiveness of PSST on six tasks, including multi-objective text generation and reasoning, and demonstrate the critical role of incorporating inference-awareness in aligning black-box LLMs using prompt optimization.
