Table of Contents
Fetching ...

Unifying Attention Heads and Task Vectors via Hidden State Geometry in In-Context Learning

Haolin Yang, Hakaze Cho, Yiqiao Zhong, Naoya Inoue

TL;DR

The authors propose a geometric framework for in-context learning (ICL) in classification that centers on two hidden-state properties: separability along a unit direction and alignment with the label-difference vector $(\mathbf{E}_{y_A}-\mathbf{E}_{y_B})$, with $S^{*}$ bounding accuracy. They show a two-stage layer dynamic: early layers increase separability via Previous Token Heads (PTHs), while middle-to-late layers enhance alignment through Induction Heads (IHs) and task-vector effects, enabling the hidden states to become effective task vectors. Across multiple models and tasks (including generation), ICL gains mainly stem from improved alignment, and ablations demonstrate distinct roles for PTHs and IHs. The work bridges attention heads and task vectors within a rigorous geometric account, offering interpretable insights and practical ideas for steering hidden-state representations to improve ICL reliability and efficiency.

Abstract

The unusual properties of in-context learning (ICL) have prompted investigations into the internal mechanisms of large language models. Prior work typically focuses on either special attention heads or task vectors at specific layers, but lacks a unified framework linking these components to the evolution of hidden states across layers that ultimately produce the model's output. In this paper, we propose such a framework for ICL in classification tasks by analyzing two geometric factors that govern performance: the separability and alignment of query hidden states. A fine-grained analysis of layer-wise dynamics reveals a striking two-stage mechanism: separability emerges in early layers, while alignment develops in later layers. Ablation studies further show that Previous Token Heads drive separability, while Induction Heads and task vectors enhance alignment. Our findings thus bridge the gap between attention heads and task vectors, offering a unified account of ICL's underlying mechanisms.

Unifying Attention Heads and Task Vectors via Hidden State Geometry in In-Context Learning

TL;DR

The authors propose a geometric framework for in-context learning (ICL) in classification that centers on two hidden-state properties: separability along a unit direction and alignment with the label-difference vector , with bounding accuracy. They show a two-stage layer dynamic: early layers increase separability via Previous Token Heads (PTHs), while middle-to-late layers enhance alignment through Induction Heads (IHs) and task-vector effects, enabling the hidden states to become effective task vectors. Across multiple models and tasks (including generation), ICL gains mainly stem from improved alignment, and ablations demonstrate distinct roles for PTHs and IHs. The work bridges attention heads and task vectors within a rigorous geometric account, offering interpretable insights and practical ideas for steering hidden-state representations to improve ICL reliability and efficiency.

Abstract

The unusual properties of in-context learning (ICL) have prompted investigations into the internal mechanisms of large language models. Prior work typically focuses on either special attention heads or task vectors at specific layers, but lacks a unified framework linking these components to the evolution of hidden states across layers that ultimately produce the model's output. In this paper, we propose such a framework for ICL in classification tasks by analyzing two geometric factors that govern performance: the separability and alignment of query hidden states. A fine-grained analysis of layer-wise dynamics reveals a striking two-stage mechanism: separability emerges in early layers, while alignment develops in later layers. Ablation studies further show that Previous Token Heads drive separability, while Induction Heads and task vectors enhance alignment. Our findings thus bridge the gap between attention heads and task vectors, offering a unified account of ICL's underlying mechanisms.

Paper Structure

This paper contains 37 sections, 2 theorems, 14 equations, 70 figures, 24 tables.

Key Result

Theorem 1

$\mathrm{Acc} \leq S^{*}$. The equality is achieved when $\max_{v \in {\mathbb{V}}, v \notin \{y_A,y_B\}}\bm{E}_{v}\bm{h}_i < \max(\bm{E}_{y_A}\bm{h}_i,\bm{E}_{y_B}\bm{h}_i), \forall i$ and $\frac{\bm{E}_{y_A} - \bm{E}_{y_B}}{\Vert\bm{E}_{y_A} - \bm{E}_{y_B}\Vert_2}=c\bm{u}^{*}$ for some positive co

Figures (70)

  • Figure 1: (A) An example for ICL input. (B) In early layers, LLMs promote separability among the last tokens' hidden state ($\bm{h}$) clusters w.r.t. the ground-truth labels of the queries through Previous Token Heads (PTHs). (C) In early layers or zero-shot scenarios, the direction where the hiddens are maximally separated is insufficiently aligned with the output direction (i.e., the difference vector of the label-token unembedding vectors), increasing cluster overlap after mapping and inducing higher classification errors, and (D) in later layers, Induction Heads (IHs) align these clusters towards the output direction, with the same underlying mechanism of task vectors.
  • Figure 2: Comparison of trends in separability and alignment measures: ICL vs. zero-shot. Under ICL, a clear phase transition emerges: separability increases first and then alignment surges. The effective dimension first rises and then declines. This pattern is missing in the zero-shot setting. Accuracy gains from ICL over zero-shot are reflected in alignment, not separability measures.
  • Figure 3: Layer-wise trends of separability and alignment measures in different ICL settings.(A) Phase transition is evident under demonstration numbers from 4 to 24. Accuracy improvements of increasing demonstrations are reflected by consistently improving alignment measures. (B) Changing the demonstration selection method to kNN retrieval preserves phase transition and improves accuracy through enhancing alignment. (C) Using uninformative demonstration labels hurts accuracy due to decreased alignment of hidden states, yet a similar phase transition is evident.
  • Figure 4: (A) In the generation setting, the separability score exhibits almost identical trends in the 0-shot and 8-shot cases. (B) In the 8-shot case, the phase-transition of composite alignment is pronounced.
  • Figure 5: Dynamics of alignment measures and semantics of post-transition hidden states. (A) Strong correlation between output and directional alignment; (B) Surge in alignment measures concur with encoding of label-related semantics; (C) Post-transition layers, except for the last ones, filter out unrelated semantics and retain the related ones. Refer to \ref{['sec:supp_layer']} for more semantics decoding cases.
  • ...and 65 more figures

Theorems & Definitions (2)

  • Theorem 1
  • Theorem 2