Table of Contents
Fetching ...

Which Attention Heads Matter for In-Context Learning?

Kayo Yin, Jacob Steinhardt

TL;DR

The paper addresses which attention heads underlie in-context learning (ICL) in decoder-only transformers, directly comparing induction heads and function vector (FV) heads across 12 models. It uses ablations with mean-output replacement and an exclusion strategy to separate overlapping contributions, finding that FV heads principally drive few-shot ICL, and induction heads contribute mainly as precursors that facilitate the emergence of FV mechanisms. The authors show that many FV heads originate from induction heads during training and that direct head overlap is limited, yet the two types are correlated in their scores. They discuss methodological implications, including how metric choices and scale affect conclusions, and propose that induction heads act as a stepping stone to the more powerful FV mechanism. Overall, the work reshapes our understanding of ICL mechanisms and offers insights for interpretability across model scales and architectures.

Abstract

Large language models (LLMs) exhibit impressive in-context learning (ICL) capability, enabling them to perform new tasks using only a few demonstrations in the prompt. Two different mechanisms have been proposed to explain ICL: induction heads that find and copy relevant tokens, and function vector (FV) heads whose activations compute a latent encoding of the ICL task. To better understand which of the two distinct mechanisms drives ICL, we study and compare induction heads and FV heads in 12 language models. Through detailed ablations, we discover that few-shot ICL performance depends primarily on FV heads, especially in larger models. In addition, we uncover that FV and induction heads are connected: many FV heads start as induction heads during training before transitioning to the FV mechanism. This leads us to speculate that induction facilitates learning the more complex FV mechanism that ultimately drives ICL.

Which Attention Heads Matter for In-Context Learning?

TL;DR

The paper addresses which attention heads underlie in-context learning (ICL) in decoder-only transformers, directly comparing induction heads and function vector (FV) heads across 12 models. It uses ablations with mean-output replacement and an exclusion strategy to separate overlapping contributions, finding that FV heads principally drive few-shot ICL, and induction heads contribute mainly as precursors that facilitate the emergence of FV mechanisms. The authors show that many FV heads originate from induction heads during training and that direct head overlap is limited, yet the two types are correlated in their scores. They discuss methodological implications, including how metric choices and scale affect conclusions, and propose that induction heads act as a stepping stone to the more powerful FV mechanism. Overall, the work reshapes our understanding of ICL mechanisms and offers insights for interpretability across model scales and architectures.

Abstract

Large language models (LLMs) exhibit impressive in-context learning (ICL) capability, enabling them to perform new tasks using only a few demonstrations in the prompt. Two different mechanisms have been proposed to explain ICL: induction heads that find and copy relevant tokens, and function vector (FV) heads whose activations compute a latent encoding of the ICL task. To better understand which of the two distinct mechanisms drives ICL, we study and compare induction heads and FV heads in 12 language models. Through detailed ablations, we discover that few-shot ICL performance depends primarily on FV heads, especially in larger models. In addition, we uncover that FV and induction heads are connected: many FV heads start as induction heads during training before transitioning to the FV mechanism. This leads us to speculate that induction facilitates learning the more complex FV mechanism that ultimately drives ICL.

Paper Structure

This paper contains 28 sections, 1 equation, 22 figures, 3 tables.

Figures (22)

  • Figure 1: (a) Ablating function vector (FV) heads significantly degrades few-shot in-context learning (ICL) accuracy, while ablating induction heads has minimal impact beyond ablating random heads. (b) Evolution of an FV head during training, demonstrating high induction scores earlier in training that decrease as FV score emerges. This pattern suggests induction may serve as a precursor for FV mechanism.
  • Figure 2: Location of induction heads (blue) and FV heads (pink) in model layers. The average layer of induction and FV heads are shown in blue and pink dotted lines respectively. Most induction heads appear in early-middle layers, FV heads appear at layers slightly deeper than induction heads.
  • Figure 3: Percentage of head overlap between induction and FV heads (left) in green, and between induction and randomly sampled heads in gray. Percentile of induction score of FV heads (center). Percentile of FV score of induction heads (right). There is little overlap between induction and FV heads, but FV heads have relatively high induction scores and vice versa.
  • Figure 4: Top: Few-shot ICL accuracy after ablating induction and FV heads. Center: Few-shot ICL accuracy after ablating non-FV induction and non-induction FV heads. Bottom: Token-loss difference after ablating non-FV induction and non-induction FV heads. Ablating FV heads lead to a bigger drop in ICL accuracy, especially in larger models. Ablating induction heads with low FV scores does not significantly affect ICL accuracy. ICL accuracy and token-loss difference behave differently.
  • Figure 5: Evolution of induction and FV score averaged over top 2% heads across training. Induction score rises sharply, then plateaus. FV score rises slightly later and gradually increases. ICL accuracy rises around the same time as induction and gradually increases.
  • ...and 17 more figures