Strings from the Library of Babel: Random Sampling as a Strong Baseline for Prompt Optimisation
Yao Lu, Jiayi Wang, Raphael Tang, Sebastian Riedel, Pontus Stenetorp
TL;DR
The paper investigates whether randomly sampled tokens can serve as effective separators for prompt-style text classification, challenging the view that prompts must be task-focused or human-readable. It proposes a Random Separator Optimisation framework with three generation strategies and evaluates separators using a small labelled set, across nine datasets and eight models. The findings show that random vocabulary separators often outperform human-crafted prompts and are competitive with self-optimising methods, with an average improvement around 12% over human baselines and a roughly 40% chance that a random separator beats the human baseline on a given task. The results also reveal that instruction-tuned models are not essential for proposing separators, and the language space is rich with effective prompts, which has broad implications for prompt engineering and in-context learning, including generative tasks where CoT prompts show high variance. Overall, the study establishes random separators as a strong, simple baseline for prompt optimisation that can generalise across models and tasks and invites reevaluation of prompt design principles.
Abstract
Recent prompt optimisation approaches use the generative nature of language models to produce prompts -- even rivaling the performance of human-curated prompts. In this paper, we demonstrate that randomly sampling tokens from the model vocabulary as ``separators'' can be as effective as language models for prompt-style text classification. Our experiments show that random separators are competitive baselines, having less than a 1% difference compared to previous self-optimisation methods and showing a 12% average relative improvement over strong human baselines across nine text classification tasks and eight language models. We further analyse this phenomenon in detail using three different random generation strategies, establishing that the language space is rich with potentially good separators, with a greater than 40% average chance that a randomly drawn separator performs better than human-curated separators. These observations challenge the common assumption that an effective prompt should be human readable or task relevant and establish a strong baseline for prompt optimisation research.
