Table of Contents
Fetching ...

Revisiting In-context Learning Inference Circuit in Large Language Models

Hakaze Cho, Mariko Kato, Yoshihiro Sakai, Naoya Inoue

TL;DR

This paper tackles the mechanism of in-context learning (ICL) in large language models by proposing a unified three-step inference circuit: Step 1 Input Text Encode, Step 2 Semantics Merge, and Step 3 Feature Retrieval and Copy. It validates the circuit across multiple models (e.g., Llama 3 70B, Falcon 40B) and six real-world classification datasets, using a centroid-like probe and kernel-alignment analyses. The results show the three-step circuit captures diverse ICL phenomena, including positional bias, label-noise robustness, and demonstration saturation, and ablation confirms its dominating role while revealing parallel bypass mechanisms. The findings offer a practical, mechanistic explanation of ICL with implications for architecture design and potential early-exit inference.

Abstract

In-context Learning (ICL) is an emerging few-shot learning paradigm on Language Models (LMs) with inner mechanisms un-explored. There are already existing works describing the inner processing of ICL, while they struggle to capture all the inference phenomena in large language models. Therefore, this paper proposes a comprehensive circuit to model the inference dynamics and try to explain the observed phenomena of ICL. In detail, we divide ICL inference into 3 major operations: (1) Input Text Encode: LMs encode every input text (in the demonstrations and queries) into linear representation in the hidden states with sufficient information to solve ICL tasks. (2) Semantics Merge: LMs merge the encoded representations of demonstrations with their corresponding label tokens to produce joint representations of labels and demonstrations. (3) Feature Retrieval and Copy: LMs search the joint representations of demonstrations similar to the query representation on a task subspace, and copy the searched representations into the query. Then, language model heads capture these copied label representations to a certain extent and decode them into predicted labels. Through careful measurements, the proposed inference circuit successfully captures and unifies many fragmented phenomena observed during the ICL process, making it a comprehensive and practical explanation of the ICL inference process. Moreover, ablation analysis by disabling the proposed steps seriously damages the ICL performance, suggesting the proposed inference circuit is a dominating mechanism. Additionally, we confirm and list some bypass mechanisms that solve ICL tasks in parallel with the proposed circuit.

Revisiting In-context Learning Inference Circuit in Large Language Models

TL;DR

This paper tackles the mechanism of in-context learning (ICL) in large language models by proposing a unified three-step inference circuit: Step 1 Input Text Encode, Step 2 Semantics Merge, and Step 3 Feature Retrieval and Copy. It validates the circuit across multiple models (e.g., Llama 3 70B, Falcon 40B) and six real-world classification datasets, using a centroid-like probe and kernel-alignment analyses. The results show the three-step circuit captures diverse ICL phenomena, including positional bias, label-noise robustness, and demonstration saturation, and ablation confirms its dominating role while revealing parallel bypass mechanisms. The findings offer a practical, mechanistic explanation of ICL with implications for architecture design and potential early-exit inference.

Abstract

In-context Learning (ICL) is an emerging few-shot learning paradigm on Language Models (LMs) with inner mechanisms un-explored. There are already existing works describing the inner processing of ICL, while they struggle to capture all the inference phenomena in large language models. Therefore, this paper proposes a comprehensive circuit to model the inference dynamics and try to explain the observed phenomena of ICL. In detail, we divide ICL inference into 3 major operations: (1) Input Text Encode: LMs encode every input text (in the demonstrations and queries) into linear representation in the hidden states with sufficient information to solve ICL tasks. (2) Semantics Merge: LMs merge the encoded representations of demonstrations with their corresponding label tokens to produce joint representations of labels and demonstrations. (3) Feature Retrieval and Copy: LMs search the joint representations of demonstrations similar to the query representation on a task subspace, and copy the searched representations into the query. Then, language model heads capture these copied label representations to a certain extent and decode them into predicted labels. Through careful measurements, the proposed inference circuit successfully captures and unifies many fragmented phenomena observed during the ICL process, making it a comprehensive and practical explanation of the ICL inference process. Moreover, ablation analysis by disabling the proposed steps seriously damages the ICL performance, suggesting the proposed inference circuit is a dominating mechanism. Additionally, we confirm and list some bypass mechanisms that solve ICL tasks in parallel with the proposed circuit.
Paper Structure (39 sections, 15 equations, 44 figures, 8 tables)

This paper contains 39 sections, 15 equations, 44 figures, 8 tables.

Figures (44)

  • Figure 1: The 3-phase inference diagram of ICL. Step 1: LMs encode every input text into representations, Step 2: LMs merge the encoded text representations of demonstrations with their corresponding label semantics, Step 3: LMs retrieve merged label-text representations similar to the encoded query, and copy the retrieved representations into the query representation.
  • Figure 2: Input text encoding magnitudes (metricized by kernel alignment with feature encoded by an encoder-structured model) of hidden states in various layers in ICL scenario (The controlled experiments are results between current 6 datasets and TEE mohammad2018semeval). Left: Encoding magnitudes on hidden states from various types of token. Middle: Encoding magnitudes with different $k$ on the forerunner tokens. Right: Encoding magnitudes in layer 24 of Llama 3 70B against the causal language modeling loss of the input text with (upper) $k = 0$ and (lower) $k = 8$.
  • Figure 3: Test results of centroid classifier trained on ICL hidden states. Solid: Centroid classification accuracy, Dotted: Kernel alignment.
  • Figure 4: The similarities of ICL hidden states in different positions on layer 24 between Left: the same queries, Right: two different queries (on SST-2).
  • Figure 5: Hidden states copy magnitude from forerunner tokens to label tokens against layers. Left: Kernel alignment between the forerunner token (the copy source) and the abstract label token of the next layer (the copy target). Middle: Curves: The count of marked forerunner token heads with correct and wrong labels; Colored Areas: The maximum attention scores from forerunner token to query (copy magnitude) with correct and wrong labels (detailed attention head statistical data is in Appendix \ref{['sec:attention_head_stat']}). Right: Centroid classifier results predicted on the hidden states of correct and wrong label tokens, on SST-2 and MR. Solid: Predicted by classifiers $\mathcal{C}_s$ trained on hidden states of forerunner tokens. Dotted: Predicted by classifiers $\mathcal{C}_y$ trained on hidden states of label tokens.
  • ...and 39 more figures