Which Attention Heads Matter for In-Context Learning?
Kayo Yin, Jacob Steinhardt
TL;DR
The paper addresses which attention heads underlie in-context learning (ICL) in decoder-only transformers, directly comparing induction heads and function vector (FV) heads across 12 models. It uses ablations with mean-output replacement and an exclusion strategy to separate overlapping contributions, finding that FV heads principally drive few-shot ICL, and induction heads contribute mainly as precursors that facilitate the emergence of FV mechanisms. The authors show that many FV heads originate from induction heads during training and that direct head overlap is limited, yet the two types are correlated in their scores. They discuss methodological implications, including how metric choices and scale affect conclusions, and propose that induction heads act as a stepping stone to the more powerful FV mechanism. Overall, the work reshapes our understanding of ICL mechanisms and offers insights for interpretability across model scales and architectures.
Abstract
Large language models (LLMs) exhibit impressive in-context learning (ICL) capability, enabling them to perform new tasks using only a few demonstrations in the prompt. Two different mechanisms have been proposed to explain ICL: induction heads that find and copy relevant tokens, and function vector (FV) heads whose activations compute a latent encoding of the ICL task. To better understand which of the two distinct mechanisms drives ICL, we study and compare induction heads and FV heads in 12 language models. Through detailed ablations, we discover that few-shot ICL performance depends primarily on FV heads, especially in larger models. In addition, we uncover that FV and induction heads are connected: many FV heads start as induction heads during training before transitioning to the FV mechanism. This leads us to speculate that induction facilitates learning the more complex FV mechanism that ultimately drives ICL.
