Unifying Attention Heads and Task Vectors via Hidden State Geometry in In-Context Learning
Haolin Yang, Hakaze Cho, Yiqiao Zhong, Naoya Inoue
TL;DR
The authors propose a geometric framework for in-context learning (ICL) in classification that centers on two hidden-state properties: separability along a unit direction and alignment with the label-difference vector $(\mathbf{E}_{y_A}-\mathbf{E}_{y_B})$, with $S^{*}$ bounding accuracy. They show a two-stage layer dynamic: early layers increase separability via Previous Token Heads (PTHs), while middle-to-late layers enhance alignment through Induction Heads (IHs) and task-vector effects, enabling the hidden states to become effective task vectors. Across multiple models and tasks (including generation), ICL gains mainly stem from improved alignment, and ablations demonstrate distinct roles for PTHs and IHs. The work bridges attention heads and task vectors within a rigorous geometric account, offering interpretable insights and practical ideas for steering hidden-state representations to improve ICL reliability and efficiency.
Abstract
The unusual properties of in-context learning (ICL) have prompted investigations into the internal mechanisms of large language models. Prior work typically focuses on either special attention heads or task vectors at specific layers, but lacks a unified framework linking these components to the evolution of hidden states across layers that ultimately produce the model's output. In this paper, we propose such a framework for ICL in classification tasks by analyzing two geometric factors that govern performance: the separability and alignment of query hidden states. A fine-grained analysis of layer-wise dynamics reveals a striking two-stage mechanism: separability emerges in early layers, while alignment develops in later layers. Ablation studies further show that Previous Token Heads drive separability, while Induction Heads and task vectors enhance alignment. Our findings thus bridge the gap between attention heads and task vectors, offering a unified account of ICL's underlying mechanisms.
