Table of Contents
Fetching ...

TATRA: Training-Free Instance-Adaptive Prompting Through Rephrasing and Aggregation

Bartosz Dziuba, Kacper Kuchta, Paweł Batorski, Przemysław Spurek, Paul Swoboda

TL;DR

TATRA, a dataset-free prompting method that constructs instance-specific few-shot prompts by synthesizing on-the-fly examples to accompany a user-provided instruction, is introduced, suggesting that per-instance construction of effective in-context examples is more important than running long, expensive optimization loops to produce a single prompt per task.

Abstract

Large Language Models (LLMs) have improved substantially alignment, yet their behavior remains highly sensitive to prompt phrasing. This brittleness has motivated automated prompt engineering, but most existing methods (i) require a task-specific training set, (ii) rely on expensive iterative optimization to produce a single dataset-level prompt, and (iii) must be rerun from scratch for each new task. We introduce TATRA, a dataset-free prompting method that constructs instance-specific few-shot prompts by synthesizing on-the-fly examples to accompany a user-provided instruction. TATRA requires no labeled training data and avoids task-specific optimization loops, while retaining the benefits of demonstration-based prompting. Across standard text classification benchmarks, TATRA matches or improves over strong prompt-optimization baselines that depend on training data and extensive search. On mathematical reasoning benchmarks, TATRA achieves state-of-the-art performance on GSM8K and DeepMath, outperforming methods that explicitly optimize prompts on those tasks. Our results suggest that per-instance construction of effective in-context examples is more important than running long, expensive optimization loops to produce a single prompt per task. We will make all code publicly available upon acceptance of the paper. Code is available at https://github.com/BMD223/TATRA

TATRA: Training-Free Instance-Adaptive Prompting Through Rephrasing and Aggregation

TL;DR

TATRA, a dataset-free prompting method that constructs instance-specific few-shot prompts by synthesizing on-the-fly examples to accompany a user-provided instruction, is introduced, suggesting that per-instance construction of effective in-context examples is more important than running long, expensive optimization loops to produce a single prompt per task.

Abstract

Large Language Models (LLMs) have improved substantially alignment, yet their behavior remains highly sensitive to prompt phrasing. This brittleness has motivated automated prompt engineering, but most existing methods (i) require a task-specific training set, (ii) rely on expensive iterative optimization to produce a single dataset-level prompt, and (iii) must be rerun from scratch for each new task. We introduce TATRA, a dataset-free prompting method that constructs instance-specific few-shot prompts by synthesizing on-the-fly examples to accompany a user-provided instruction. TATRA requires no labeled training data and avoids task-specific optimization loops, while retaining the benefits of demonstration-based prompting. Across standard text classification benchmarks, TATRA matches or improves over strong prompt-optimization baselines that depend on training data and extensive search. On mathematical reasoning benchmarks, TATRA achieves state-of-the-art performance on GSM8K and DeepMath, outperforming methods that explicitly optimize prompts on those tasks. Our results suggest that per-instance construction of effective in-context examples is more important than running long, expensive optimization loops to produce a single prompt per task. We will make all code publicly available upon acceptance of the paper. Code is available at https://github.com/BMD223/TATRA
Paper Structure (25 sections, 2 equations, 4 figures, 5 tables, 1 algorithm)

This paper contains 25 sections, 2 equations, 4 figures, 5 tables, 1 algorithm.

Figures (4)

  • Figure 1: Comparison of TATRA to existing automated prompt-engineering methods. Most prior approaches require a task-specific training set and run expensive dataset-level optimization loops to produce a single prompt per task. In contrast, TATRA is training-free and dataset-free, constructing a small set of instance-specific few-shot demonstrations on the fly and aggregating predictions across rephrasings for robust per-sample prompting.
  • Figure 2: Overview of the TATRA prompting pipeline for subjective/objective classification. The process begins with an initial observation. A generator model produces multiple semantically equivalent paraphrases (e.g., swapping "native americans" for "indigenous peoples") to ensure linguistic robustness. Simultaneously, a set of label-balanced, synthetic few-shot examples is generated. These examples are shuffled and concatenated to both the original observation and its paraphrases. An evaluator model then provides a prediction for each variant. This paraphrase and in-context example generation is run $r$ times independently. Finally, all individual predictions are aggregated via majority vote (showing a strong bias toward the "Subj" label in this instance) to produce the final prediction.
  • Figure 3: Impact of increasing in-context examples ($k$). We vary $k \in \{4, \dots, 32\}$ while keeping $n$ fixed. The shading indicates the recorded seed std. deviation
  • Figure 4: Impact of increasing paraphrases ($n$). We vary $n \in \{0, \dots, 15\}$ while keeping $k$ fixed. The shading indicates the recorded seed std. deviation.