In-context Learning and Induction Heads

Catherine Olsson; Nelson Elhage; Neel Nanda; Nicholas Joseph; Nova DasSarma; Tom Henighan; Ben Mann; Amanda Askell; Yuntao Bai; Anna Chen; Tom Conerly; Dawn Drain; Deep Ganguli; Zac Hatfield-Dodds; Danny Hernandez; Scott Johnston; Andy Jones; Jackson Kernion; Liane Lovitt; Kamal Ndousse; Dario Amodei; Tom Brown; Jack Clark; Jared Kaplan; Sam McCandlish; Chris Olah

In-context Learning and Induction Heads

Catherine Olsson, Nelson Elhage, Neel Nanda, Nicholas Joseph, Nova DasSarma, Tom Henighan, Ben Mann, Amanda Askell, Yuntao Bai, Anna Chen, Tom Conerly, Dawn Drain, Deep Ganguli, Zac Hatfield-Dodds, Danny Hernandez, Scott Johnston, Andy Jones, Jackson Kernion, Liane Lovitt, Kamal Ndousse, Dario Amodei, Tom Brown, Jack Clark, Jared Kaplan, Sam McCandlish, Chris Olah

TL;DR

It is found that induction heads develop at precisely the same point as a sudden sharp increase in in-context learning ability, visible as a bump in the training loss.

Abstract

"Induction heads" are attention heads that implement a simple algorithm to complete token sequences like [A][B] ... [A] -> [B]. In this work, we present preliminary and indirect evidence for a hypothesis that induction heads might constitute the mechanism for the majority of all "in-context learning" in large transformer models (i.e. decreasing loss at increasing token indices). We find that induction heads develop at precisely the same point as a sudden sharp increase in in-context learning ability, visible as a bump in the training loss. We present six complementary lines of evidence, arguing that induction heads may be the mechanistic source of general in-context learning in transformer models of any size. For small attention-only models, we present strong, causal evidence; for larger models with MLPs, we present correlational evidence.

In-context Learning and Induction Heads

TL;DR

It is found that induction heads develop at precisely the same point as a sudden sharp increase in in-context learning ability, visible as a bump in the training loss.

Abstract

Paper Structure (1 section, 2 equations, 22 figures, 8 tables)

This paper contains 1 section, 2 equations, 22 figures, 8 tables.

- A PER-TOKEN LOSSES ON HARRY POTTER

Figures (22)

Figure 1: SUMMARY OF EVIDENCE FOR SUB-CLAIMS (STRONGEST ARGUMENT FOR EACH)
Figure 2: TWO LAYER (ATTENTION-ONLY)
Figure 3: The highlighted "phase change" portion of training is the same area highlighted in previous plots. It is selected based on the derivative of the in-context score.
Figure 4: INDUCTION HEADS FORM IN PHASE CHANGE Each line is an attention head, scored by the "prefix matching" evaluation introduced below.
Figure 5: LOSS CURVES DIVERGE DURING PHASE CHANGE
...and 17 more figures

In-context Learning and Induction Heads

TL;DR

Abstract

In-context Learning and Induction Heads

Authors

TL;DR

Abstract

Table of Contents

Figures (22)