Prompts Generalize with Low Data: Non-vacuous Generalization Bounds for Optimizing Prompts with More Informative Priors
David Madras, Joshua Safyan, Qiuyi, Zhang
TL;DR
This paper tackles generalization in prompt optimization under data scarcity by introducing data-dependent PAC-Bayes bounds that leverage perplexity-informed priors. It formalizes a data-dependent bound, showing how conditioning the prior on a meta-prompt and incorporating LLM perplexity terms tightens the KL regularizer and yields non-vacuous guarantees even with limited data. The authors validate the approach empirically on a hate-speech classification task, demonstrating that informative priors (including data-dependent priors) lead to tighter bounds (around 0.46) and improved test performance (e.g., lower error) compared to uninformative priors. The work highlights the practical potential of perplexity-aware regularization for robust prompt generalization and outlines avenues for richer priors and optimized algorithms in future research.
Abstract
Many prompt engineering techniques have been successful in practice, even when optimizing over a large prompt space with with a small amount of task-specific data. Recent work has partially explained this success by showing generalization bounds which apply PAC-Bayes theory to the discrete prompt space, but they are non-vacuous only in data-rich scenarios. We argue that such widespread success can be more fully explained through more carefully considering data- or distribution-dependent perplexity, which acts as an effective prior and steers the optimization towards prompts that are more ``natural'' for the task at hand. We derive novel generalization bounds that are non-vacuous for data-scarce prompt optimization via more useful priors, formally analyzing how perplexity regularization tightens these bounds by limiting exploration. Empirically, we explore both the bounds' effectiveness and the practical benefits of perplexity regularization in improving prompt generalization.
