Prompt Exploration with Prompt Regression
Michael Feffer, Ronald Xu, Yuekai Sun, Mikhail Yurochkin
TL;DR
This work addresses the challenge of systematizing prompt construction for large language models by introducing Prompt Exploration with Prompt Regression (PEPR). PEPR breaks prompt design into a regression step that predicts the effect of individual library elements on outputs and a subsequent selection step that assembles an effective prompt under a fixed budget, using either log-probability data or human preference signals. The approach relies on an independence of irrelevant alternatives assumption to keep the regression tractable, enabling efficient extrapolation from K observed prompts to 2^K−1 possible combinations, with two formulations: PEPR-R (log-probability) and PEPR-P (preference-based). Across multiple open-source LLMs and datasets (Toy, HateCheck, CAMEL, Natural Instructions) and several model sizes, PEPR-tuned prompts frequently outperform baselines and approach or reach the best possible configurations under limited evaluation budgets, though some libraries show that random selection can occasionally beat model-guided prompts. The work highlights PEPR’s potential to reduce brute-force search in prompt engineering, while noting limitations related to library quality and the linearity assumption, and it points to future directions including richer features, nonlinear models, and broader prompt components. The framework has practical implications for safer, more reliable, and scalable prompt optimization in real-world LLM deployments.
Abstract
In the advent of democratized usage of large language models (LLMs), there is a growing desire to systematize LLM prompt creation and selection processes beyond iterative trial-and-error. Prior works majorly focus on searching the space of prompts without accounting for relations between prompt variations. Here we propose a framework, Prompt Exploration with Prompt Regression (PEPR), to predict the effect of prompt combinations given results for individual prompt elements as well as a simple method to select an effective prompt for a given use-case. We evaluate our approach with open-source LLMs of different sizes on several different tasks.
