Table of Contents
Fetching ...

Strings from the Library of Babel: Random Sampling as a Strong Baseline for Prompt Optimisation

Yao Lu, Jiayi Wang, Raphael Tang, Sebastian Riedel, Pontus Stenetorp

TL;DR

The paper investigates whether randomly sampled tokens can serve as effective separators for prompt-style text classification, challenging the view that prompts must be task-focused or human-readable. It proposes a Random Separator Optimisation framework with three generation strategies and evaluates separators using a small labelled set, across nine datasets and eight models. The findings show that random vocabulary separators often outperform human-crafted prompts and are competitive with self-optimising methods, with an average improvement around 12% over human baselines and a roughly 40% chance that a random separator beats the human baseline on a given task. The results also reveal that instruction-tuned models are not essential for proposing separators, and the language space is rich with effective prompts, which has broad implications for prompt engineering and in-context learning, including generative tasks where CoT prompts show high variance. Overall, the study establishes random separators as a strong, simple baseline for prompt optimisation that can generalise across models and tasks and invites reevaluation of prompt design principles.

Abstract

Recent prompt optimisation approaches use the generative nature of language models to produce prompts -- even rivaling the performance of human-curated prompts. In this paper, we demonstrate that randomly sampling tokens from the model vocabulary as ``separators'' can be as effective as language models for prompt-style text classification. Our experiments show that random separators are competitive baselines, having less than a 1% difference compared to previous self-optimisation methods and showing a 12% average relative improvement over strong human baselines across nine text classification tasks and eight language models. We further analyse this phenomenon in detail using three different random generation strategies, establishing that the language space is rich with potentially good separators, with a greater than 40% average chance that a randomly drawn separator performs better than human-curated separators. These observations challenge the common assumption that an effective prompt should be human readable or task relevant and establish a strong baseline for prompt optimisation research.

Strings from the Library of Babel: Random Sampling as a Strong Baseline for Prompt Optimisation

TL;DR

The paper investigates whether randomly sampled tokens can serve as effective separators for prompt-style text classification, challenging the view that prompts must be task-focused or human-readable. It proposes a Random Separator Optimisation framework with three generation strategies and evaluates separators using a small labelled set, across nine datasets and eight models. The findings show that random vocabulary separators often outperform human-crafted prompts and are competitive with self-optimising methods, with an average improvement around 12% over human baselines and a roughly 40% chance that a random separator beats the human baseline on a given task. The results also reveal that instruction-tuned models are not essential for proposing separators, and the language space is rich with effective prompts, which has broad implications for prompt engineering and in-context learning, including generative tasks where CoT prompts show high variance. Overall, the study establishes random separators as a strong, simple baseline for prompt optimisation that can generalise across models and tasks and invites reevaluation of prompt design principles.

Abstract

Recent prompt optimisation approaches use the generative nature of language models to produce prompts -- even rivaling the performance of human-curated prompts. In this paper, we demonstrate that randomly sampling tokens from the model vocabulary as ``separators'' can be as effective as language models for prompt-style text classification. Our experiments show that random separators are competitive baselines, having less than a 1% difference compared to previous self-optimisation methods and showing a 12% average relative improvement over strong human baselines across nine text classification tasks and eight language models. We further analyse this phenomenon in detail using three different random generation strategies, establishing that the language space is rich with potentially good separators, with a greater than 40% average chance that a randomly drawn separator performs better than human-curated separators. These observations challenge the common assumption that an effective prompt should be human readable or task relevant and establish a strong baseline for prompt optimisation research.
Paper Structure (35 sections, 2 figures, 13 tables)

This paper contains 35 sections, 2 figures, 13 tables.

Figures (2)

  • Figure 1: Illustration of our approach when searching for good separators for a sentiment classification task. Unlike relying on human knowledge or using external large language models to suggest alternatives, we find that randomly selected separators from the vocabulary can also yield good performance.
  • Figure 2: Our random separator optimisation procedure.