Table of Contents
Fetching ...

Induction Heads as an Essential Mechanism for Pattern Matching in In-context Learning

Joy Crosbie, Ekaterina Shutova

TL;DR

The paper addresses the question of how in-context learning emerges inside large language models by focusing on induction heads as a key mechanism for pattern matching. Using two state-of-the-art models, Llama-3-8B and InternLM2-20B, it combines identification of induction heads, targeted head ablations, and attention knockout to establish a causal link between prefix matching/copying operations and few-shot ICL performance across abstract patterns and NLP tasks. The results show that ablating induction heads significantly degrades ICL performance, with random ablations having milder effects, and that disrupting the induction pattern via attention knockout can match or exceed the impact of full head removals. Collectively, the findings provide strong mechanistic evidence that induction heads implement a fuzzy prefix-matching and copying mechanism essential for ICL, offering a concrete contribution to mechanistic interpretability and our understanding of few-shot learning in transformers.

Abstract

Large language models (LLMs) have shown a remarkable ability to learn and perform complex tasks through in-context learning (ICL). However, a comprehensive understanding of its internal mechanisms is still lacking. This paper explores the role of induction heads in a few-shot ICL setting. We analyse two state-of-the-art models, Llama-3-8B and InternLM2-20B on abstract pattern recognition and NLP tasks. Our results show that even a minimal ablation of induction heads leads to ICL performance decreases of up to ~32% for abstract pattern recognition tasks, bringing the performance close to random. For NLP tasks, this ablation substantially decreases the model's ability to benefit from examples, bringing few-shot ICL performance close to that of zero-shot prompts. We further use attention knockout to disable specific induction patterns, and present fine-grained evidence for the role that the induction mechanism plays in ICL.

Induction Heads as an Essential Mechanism for Pattern Matching in In-context Learning

TL;DR

The paper addresses the question of how in-context learning emerges inside large language models by focusing on induction heads as a key mechanism for pattern matching. Using two state-of-the-art models, Llama-3-8B and InternLM2-20B, it combines identification of induction heads, targeted head ablations, and attention knockout to establish a causal link between prefix matching/copying operations and few-shot ICL performance across abstract patterns and NLP tasks. The results show that ablating induction heads significantly degrades ICL performance, with random ablations having milder effects, and that disrupting the induction pattern via attention knockout can match or exceed the impact of full head removals. Collectively, the findings provide strong mechanistic evidence that induction heads implement a fuzzy prefix-matching and copying mechanism essential for ICL, offering a concrete contribution to mechanistic interpretability and our understanding of few-shot learning in transformers.

Abstract

Large language models (LLMs) have shown a remarkable ability to learn and perform complex tasks through in-context learning (ICL). However, a comprehensive understanding of its internal mechanisms is still lacking. This paper explores the role of induction heads in a few-shot ICL setting. We analyse two state-of-the-art models, Llama-3-8B and InternLM2-20B on abstract pattern recognition and NLP tasks. Our results show that even a minimal ablation of induction heads leads to ICL performance decreases of up to ~32% for abstract pattern recognition tasks, bringing the performance close to random. For NLP tasks, this ablation substantially decreases the model's ability to benefit from examples, bringing few-shot ICL performance close to that of zero-shot prompts. We further use attention knockout to disable specific induction patterns, and present fine-grained evidence for the role that the induction mechanism plays in ICL.
Paper Structure (39 sections, 1 equation, 108 figures, 5 tables)

This paper contains 39 sections, 1 equation, 108 figures, 5 tables.

Figures (108)

  • Figure 1: In the sequence "...vintage cars ... vintage", an induction head identifies the initial occurrence of "vintage", attends to the subsequent word "cars" for prefix matching, and predicts "cars" as the next word through the copying mechanism.
  • Figure 2: Prefix matching scores for Llama-3-8B.
  • Figure 3: Each letter-sequence dataset features examples following the respective patterns labelled "Foo" and random sequences labelled "Bar". For word sequence tasks, examples include pairs of semantically categorised words.
  • Figure 4: Change in ICL benefit for Llama-3-8B (top) and InternLM2-20B (bottom), due to head ablations when compared to that of the full model. "1% ind." and "3% ind." denote ablating the top respective percentage of induction heads. "1% rnd." and "3% rnd." denote randomly ablating the respective percentage of all heads in the model.
  • Figure 5: SUL ablation experiments for Llama-3-8B (top) and InternLM2-20B (bottom) on NLP tasks. "1% ind." and "3% ind." denote ablating the top respective percentage of induction heads. "1% rnd." and "3% rnd." denote randomly ablating the respective percentage of all heads in the model.
  • ...and 103 more figures