Instruction Following without Instruction Tuning

John Hewitt; Nelson F. Liu; Percy Liang; Christopher D. Manning

Instruction Following without Instruction Tuning

John Hewitt, Nelson F. Liu, Percy Liang, Christopher D. Manning

TL;DR

The paper investigates whether instruction following can emerge without explicit instruction tuning, revealing two forms of implicit adaptation—response tuning and single-task finetuning—that induce instruction-following behavior. It demonstrates that training solely on responses yields substantial instruction-following performance against instruction-tuned baselines, and that narrow-domain single-task finetuning can generalize to broader instruction-following on dissimilar prompts. To explain this phenomenon, the authors show that simple distribution changes, and even a handcrafted rule-based adapter operating as a product of distributions, can produce instruction-following behavior competitive with more deliberate tuning. These findings imply that practitioners should rigorously test adapted models across broad inputs and safety scenarios, as implicit instruction tuning can emerge from seemingly unrelated training regimes. The results also encourage a deeper look at the mechanisms by which pretrained models encode instruction-response mappings and how small perturbations to distributions can steer behavior.

Abstract

Instruction tuning commonly means finetuning a language model on instruction-response pairs. We discover two forms of adaptation (tuning) that are deficient compared to instruction tuning, yet still yield instruction following; we call this implicit instruction tuning. We first find that instruction-response pairs are not necessary: training solely on responses, without any corresponding instructions, yields instruction following. This suggests pretrained models have an instruction-response mapping which is revealed by teaching the model the desired distribution of responses. However, we then find it's not necessary to teach the desired distribution of responses: instruction-response training on narrow-domain data like poetry still leads to broad instruction-following behavior like recipe generation. In particular, when instructions are very different from those in the narrow finetuning domain, models' responses do not adhere to the style of the finetuning domain. To begin to explain implicit instruction tuning, we hypothesize that very simple changes to a language model's distribution yield instruction following. We support this by hand-writing a rule-based language model which yields instruction following in a product-of-experts with a pretrained model. The rules are to slowly increase the probability of ending the sequence, penalize repetition, and uniformly change 15 words' probabilities. In summary, adaptations made without being designed to yield instruction following can do so implicitly.

Instruction Following without Instruction Tuning

TL;DR

Abstract

Paper Structure (35 sections, 5 equations, 6 figures, 8 tables)

This paper contains 35 sections, 5 equations, 6 figures, 8 tables.

Introduction
A note on pretraining data
Related Work
Designing instruction tuning datasets.
Ablation studies on instruction tuning.
Out-of-distribution generalization.
Experiment Setting
Instruction tuning.
Instruction formatting.
Defining instruction-following behavior.
Response Tuning Yields Instruction Following
Response Tuning
Method.
Experiments.
Results.
...and 20 more sections

Figures (6)

Figure 1: Instruction tuning trains a language model on responses conditioned on instructions. We find that (1) response tuning (estimating the probability of responses with no instructions), (2) single-task finetuning (e.g., code or poetry generation), and even (3) a simple rule-based adapter all yield language models with general instruction-following behavior.
Figure 2: Responses from response tuning, instruction tuning, and the base Llama-2-7B model.
Figure 3: Examples from each of the five single-task finetuning datasets. At the left of each dataset is the input that is conditioned on, and at the right is the output that is learned.
Figure 4: Responses generated by single-task finetuned models for each of our five datasets. MBPP trains only on python snippet generation, GSM on math word problems, Poetry on poetry generation, Recipe on recipe generation, and Chess on chess game generation. Yet, except for Chess, the responses deviate from the single-task behavior towards reasonable responses.
Figure 5: For a GSM-finetuned model, the similarity between a test-time instruction to the instructions in the GSM dataset (x-axis) plotted against the similarity between the model's generated response to GSM responses (minus the similarity of that response to LIMA broad responses). On the left, an example of an average-similarity instruction; note that the response is unlike GSM formatting, except for the telltale ####1, which is how GSM formats its final answer. On the right, a very high-similarity instruction leads to GSM-like behavior.
...and 1 more figures

Instruction Following without Instruction Tuning

TL;DR

Abstract

Instruction Following without Instruction Tuning

Authors

TL;DR

Abstract

Table of Contents

Figures (6)