Table of Contents
Fetching ...

LLM Based Bayesian Optimization for Prompt Search

Adam Ballew, Jingbo Wang, Shaogang Ren

TL;DR

Prompt sensitivity in LLM-based text classification motivates automated, data-efficient prompt optimization. The authors propose BO-LLM, a Bayesian Optimization framework that uses a LLM-powered Gaussian Process surrogate and a UCB acquisition to search a discrete prompt space, augmented by an LLM-based expansion module to generate candidates. They formalize data representations, surrogate modeling, and acquisition, and demonstrate competitive results on the LIAR and ETHOS datasets compared with ProTeGi, including an extension to multi-turn clarification QA. The work highlights the potential for principled, sample-efficient prompt engineering while acknowledging challenges from evaluation noise, label-reversal issues, and stability-cost trade-offs, paving the way for scalable, automated prompt optimization in real-world settings.

Abstract

Bayesian Optimization (BO) has been widely used to efficiently optimize expensive black-box functions with limited evaluations. In this paper, we investigate the use of BO for prompt engineering to enhance text classification with Large Language Models (LLMs). We employ an LLM-powered Gaussian Process (GP) as the surrogate model to estimate the performance of different prompt candidates. These candidates are generated by an LLM through the expansion of a set of seed prompts and are subsequently evaluated using an Upper Confidence Bound (UCB) acquisition function in conjunction with the GP posterior. The optimization process iteratively refines the prompts based on a subset of the data, aiming to improve classification accuracy while reducing the number of API calls by leveraging the prediction uncertainty of the LLM-based GP. The proposed BO-LLM algorithm is evaluated on two datasets, and its advantages are discussed in detail in this paper.

LLM Based Bayesian Optimization for Prompt Search

TL;DR

Prompt sensitivity in LLM-based text classification motivates automated, data-efficient prompt optimization. The authors propose BO-LLM, a Bayesian Optimization framework that uses a LLM-powered Gaussian Process surrogate and a UCB acquisition to search a discrete prompt space, augmented by an LLM-based expansion module to generate candidates. They formalize data representations, surrogate modeling, and acquisition, and demonstrate competitive results on the LIAR and ETHOS datasets compared with ProTeGi, including an extension to multi-turn clarification QA. The work highlights the potential for principled, sample-efficient prompt engineering while acknowledging challenges from evaluation noise, label-reversal issues, and stability-cost trade-offs, paving the way for scalable, automated prompt optimization in real-world settings.

Abstract

Bayesian Optimization (BO) has been widely used to efficiently optimize expensive black-box functions with limited evaluations. In this paper, we investigate the use of BO for prompt engineering to enhance text classification with Large Language Models (LLMs). We employ an LLM-powered Gaussian Process (GP) as the surrogate model to estimate the performance of different prompt candidates. These candidates are generated by an LLM through the expansion of a set of seed prompts and are subsequently evaluated using an Upper Confidence Bound (UCB) acquisition function in conjunction with the GP posterior. The optimization process iteratively refines the prompts based on a subset of the data, aiming to improve classification accuracy while reducing the number of API calls by leveraging the prediction uncertainty of the LLM-based GP. The proposed BO-LLM algorithm is evaluated on two datasets, and its advantages are discussed in detail in this paper.

Paper Structure

This paper contains 46 sections, 14 equations, 5 figures, 1 table, 3 algorithms.

Figures (5)

  • Figure 1: How prompts are transformed from texts to numerical representation.
  • Figure 2: Complete system architecture showing data flow between BO-LLM components.
  • Figure 3: BO-LLM optimization loop.
  • Figure 4: Comparison of BO-LLM and ProTeGi on LIAR and ETHOS datasets over 10 rounds.
  • Figure 5: Surrogate posterior mean and STD of prompts selected by UCB (LIAR dataset).