Table of Contents
Fetching ...

How do Large Language Models Learn In-Context? Query and Key Matrices of In-Context Heads are Two Towers for Metric Learning

Zeping Yu, Sophia Ananiadou

TL;DR

The paper addresses the mechanism of in-context learning (ICL) by identifying a small set of in-context heads whose intervention drastically alters ICL performance on sentence classification with semantically unrelated labels. It proposes a two-tower metric-learning hypothesis wherein value-output matrices extract label features and query-key matrices compute similarity between the last position and each label position, enabling label information flow. Across five datasets, 12 in-context heads are shown to govern ICL, and two bias-mitigation methods are proposed that reduce majority label bias by 22% and recency bias by 17%, respectively. The work provides a concrete mechanistic account of ICL and practical strategies to reduce biases in demonstrations, advancing mechanistic interpretability and robustness of ICL in real tasks.

Abstract

We investigate the mechanism of in-context learning (ICL) on sentence classification tasks with semantically-unrelated labels ("foo"/"bar"). We find intervening in only 1\% heads (named "in-context heads") significantly affects ICL accuracy from 87.6\% to 24.4\%. To understand this phenomenon, we analyze the value-output vectors in these heads and discover that the vectors at each label position contain substantial information about the corresponding labels. Furthermore, we observe that the prediction shift from "foo" to "bar" is due to the respective reduction and increase in these heads' attention scores at "foo" and "bar" positions. Therefore, we propose a hypothesis for ICL: in in-context heads, the value-output matrices extract label features, while the query-key matrices compute the similarity between the features at the last position and those at each label position. The query and key matrices can be considered as two towers that learn the similarity metric between the last position's features and each demonstration at label positions. Using this hypothesis, we explain the majority label bias and recency bias in ICL and propose two methods to reduce these biases by 22\% and 17\%, respectively.

How do Large Language Models Learn In-Context? Query and Key Matrices of In-Context Heads are Two Towers for Metric Learning

TL;DR

The paper addresses the mechanism of in-context learning (ICL) by identifying a small set of in-context heads whose intervention drastically alters ICL performance on sentence classification with semantically unrelated labels. It proposes a two-tower metric-learning hypothesis wherein value-output matrices extract label features and query-key matrices compute similarity between the last position and each label position, enabling label information flow. Across five datasets, 12 in-context heads are shown to govern ICL, and two bias-mitigation methods are proposed that reduce majority label bias by 22% and recency bias by 17%, respectively. The work provides a concrete mechanistic account of ICL and practical strategies to reduce biases in demonstrations, advancing mechanistic interpretability and robustness of ICL in real tasks.

Abstract

We investigate the mechanism of in-context learning (ICL) on sentence classification tasks with semantically-unrelated labels ("foo"/"bar"). We find intervening in only 1\% heads (named "in-context heads") significantly affects ICL accuracy from 87.6\% to 24.4\%. To understand this phenomenon, we analyze the value-output vectors in these heads and discover that the vectors at each label position contain substantial information about the corresponding labels. Furthermore, we observe that the prediction shift from "foo" to "bar" is due to the respective reduction and increase in these heads' attention scores at "foo" and "bar" positions. Therefore, we propose a hypothesis for ICL: in in-context heads, the value-output matrices extract label features, while the query-key matrices compute the similarity between the features at the last position and those at each label position. The query and key matrices can be considered as two towers that learn the similarity metric between the last position's features and each demonstration at label positions. Using this hypothesis, we explain the majority label bias and recency bias in ICL and propose two methods to reduce these biases by 22\% and 17\%, respectively.
Paper Structure (22 sections, 4 equations, 5 figures, 10 tables)

This paper contains 22 sections, 4 equations, 5 figures, 10 tables.

Figures (5)

  • Figure 1: Hypothesis of ICL mechanism. (a) Shallow layers merge features into label positions and last position. In in-context heads, (b) value-output matrix VO extracts label information. (c) Query matrix Q and (d) key matrix K compute the (e) similarity scores between last position and each demonstration, deciding how much label information is transferred into the last token.
  • Figure 2: Attention scores on foo positions in fooheads and bar positions in barheads, on original dataset and imbalanced dataset in Llama (left) and GPT-J (right).
  • Figure 3: Attention scores on foo positions in fooheads and bar positions in barheads, on original dataset and reverse dataset in Llama (left) and GPT-J (right).
  • Figure 4: Attention scores on "foo"/"bar" positions in original, imbalanced, and recency datasets in Llama.
  • Figure 5: Attention scores on "foo"/"bar" positions in original, imbalanced, and recency datasets in GPT-J.