Prompt Baking

Aman Bhargava; Cameron Witkowski; Alexander Detkov; Matt Thomson

Prompt Baking

Aman Bhargava, Cameron Witkowski, Alexander Detkov, Matt Thomson

TL;DR

A technique for baking prompts into the weights of an LLM, with implications for AI safety, continuous model updating, enhancing real-time learning capabilities in LLM-based agents, and generating more stable AI personas is presented.

Abstract

Two primary ways to change LLM behavior are prompting and weight updates (e.g., fine-tuning). Prompting LLMs is simple and effective, specifying the desired changes explicitly in natural language, whereas weight updates provide more expressive and permanent behavior changes, specified implicitly via training on large datasets. We present a technique for "baking" prompts into the weights of an LLM. Prompt Baking converts a prompt $u$ and initial weights $θ$ to a new set of weights $θ_u$ such that new "baked" LLM behaves like the original prompted LLM. Mathematically, we minimize the KL divergence between $P_θ(\cdot | u)$ and $P_{θ_u}(\cdot)$, where $P$ is the LLM's probability distribution over token sequences. Across all our experiments, we find prompts can be readily baked into weight updates. Baking chain-of-thought prompts improves zero-shot performance on GSM8K, ASDiv, MBPP, ARC-Easy, ARC-Challenge, and CommonsenseQA benchmarks. Baking news headlines directly updates an LLM's knowledge. And baking instructions & personas alleviates "prompt forgetting" over long sequences. Furthermore, stopping baking early creates "half-baked" models, continuously scaling prompt strength. Baked models retain their sensitivity to further prompting and baking, including re-prompting with the baked-in prompt. Surprisingly, the re-prompted models yield further performance gains in instruction following, as well as math reasoning and coding benchmarks. Taking re-prompting and re-baking to the limit yields a form of iterative self-improvement we call Prompt Pursuit, and preliminary results on instruction following exhibit dramatic performance gains. Finally, we discuss implications for AI safety, continuous model updating, enhancing real-time learning capabilities in LLM-based agents, and generating more stable AI personas.

Prompt Baking

TL;DR

Abstract

and initial weights

to a new set of weights

such that new "baked" LLM behaves like the original prompted LLM. Mathematically, we minimize the KL divergence between

and

, where

is the LLM's probability distribution over token sequences. Across all our experiments, we find prompts can be readily baked into weight updates. Baking chain-of-thought prompts improves zero-shot performance on GSM8K, ASDiv, MBPP, ARC-Easy, ARC-Challenge, and CommonsenseQA benchmarks. Baking news headlines directly updates an LLM's knowledge. And baking instructions & personas alleviates "prompt forgetting" over long sequences. Furthermore, stopping baking early creates "half-baked" models, continuously scaling prompt strength. Baked models retain their sensitivity to further prompting and baking, including re-prompting with the baked-in prompt. Surprisingly, the re-prompted models yield further performance gains in instruction following, as well as math reasoning and coding benchmarks. Taking re-prompting and re-baking to the limit yields a form of iterative self-improvement we call Prompt Pursuit, and preliminary results on instruction following exhibit dramatic performance gains. Finally, we discuss implications for AI safety, continuous model updating, enhancing real-time learning capabilities in LLM-based agents, and generating more stable AI personas.

Paper Structure (39 sections, 18 equations, 8 figures, 3 tables)

This paper contains 39 sections, 18 equations, 8 figures, 3 tables.

Introduction
Contribution
Related Work
Model Distillation:
LLM Control Theory:
Self-Improvement:
Knowledge Editing:
Continual Learning and Catastrophic Forgetting:
Prompt Decay:
Activation Manipulation:
Methods
Results
Example: Baking In Sadness
Baking in instruction following prompts
Baking in chain-of-thought examples on Academic Benchmarks
...and 24 more sections

Figures (8)

Figure 1: An illustration of Prompt Baking.
Figure 2: Prompt Baking: "Always sad" example. A. Behavior Interpolation: Line plot shows the negative sentiment of the baked model (blue) climb during the baking process from the baseline model with no prompt $(\theta, \varnothing)$ in green to the prompted baseline performance $(\theta, \mathbf u)$ in orange. B. Response Likelihood Alignment $(r^2)$: After baking in "always sad" prompt $\mathbf u$, the baked model's likelihoods over token sequences closely correlate with those of the prompted baseline ($r^2$ increases from $0.69$ to $0.98$). C. Token-Level Alignment: Example showing how token likelihoods in unprompted responses align with the prompted baseline before after baking, initially diverging from the original unprompted model.
Figure 3: Baking instruction following prompts yields baked models that preform to within 8% of the baseline prompted performance. Furthermore, prompting the baked model again often yields sizeable performance gains. For pursuit (green icons) see Section \ref{['sec:pursuit']}.
Figure 4: Baking few-shot examples improves zero-shot performance on all benchmarks, and comes within 1.4% of the full few-shot accuracy on all benchmarks. Values listed are the averages from training with 3 random seeds.
Figure 5: Baking then prompting the baked model often surpasses the original model's few-shot performance. Values listed are the averages from training with 3 random seeds.
...and 3 more figures

Prompt Baking

TL;DR

Abstract

Prompt Baking

Authors

TL;DR

Abstract

Table of Contents

Figures (8)