Table of Contents
Fetching ...

Universal Response and Emergence of Induction in LLMs

Niclas Luick

TL;DR

It is found that LLMs exhibit a robust, universal regime in which their response remains scale-invariant under changes in perturbation strength, thereby allowing us to quantify the build-up of token correlations throughout the model.

Abstract

While induction is considered a key mechanism for in-context learning in LLMs, understanding its precise circuit decomposition beyond toy models remains elusive. Here, we study the emergence of induction behavior within LLMs by probing their response to weak single-token perturbations of the residual stream. We find that LLMs exhibit a robust, universal regime in which their response remains scale-invariant under changes in perturbation strength, thereby allowing us to quantify the build-up of token correlations throughout the model. By applying our method, we observe signatures of induction behavior within the residual stream of Gemma-2-2B, Llama-3.2-3B, and GPT-2-XL. Across all models, we find that these induction signatures gradually emerge within intermediate layers and identify the relevant model sections composing this behavior. Our results provide insights into the collective interplay of components within LLMs and serve as a benchmark for large-scale circuit analysis.

Universal Response and Emergence of Induction in LLMs

TL;DR

It is found that LLMs exhibit a robust, universal regime in which their response remains scale-invariant under changes in perturbation strength, thereby allowing us to quantify the build-up of token correlations throughout the model.

Abstract

While induction is considered a key mechanism for in-context learning in LLMs, understanding its precise circuit decomposition beyond toy models remains elusive. Here, we study the emergence of induction behavior within LLMs by probing their response to weak single-token perturbations of the residual stream. We find that LLMs exhibit a robust, universal regime in which their response remains scale-invariant under changes in perturbation strength, thereby allowing us to quantify the build-up of token correlations throughout the model. By applying our method, we observe signatures of induction behavior within the residual stream of Gemma-2-2B, Llama-3.2-3B, and GPT-2-XL. Across all models, we find that these induction signatures gradually emerge within intermediate layers and identify the relevant model sections composing this behavior. Our results provide insights into the collective interplay of components within LLMs and serve as a benchmark for large-scale circuit analysis.

Paper Structure

This paper contains 9 sections, 4 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: Probing the response of LLMs. We weakly perturb the residual stream vector $\mathbf{x}^{(0)}$ directly at the transformer input ($\ell=0$) for a single sequence position, $i=0,\dots, T-1$, by applying a scaling transformation $x_i{}^{(0)} \rightarrow {x}_i'{}^{(0)}(\varepsilon)=(1-\varepsilon)\cdot x_i{}^{(0)}$, while leaving all other sequence positions unchanged. To measure the response of the model, we compare the evolution of this perturbed vector $\mathbf{x'}{}^{(\ell)}$ to the unperturbed vector $\mathbf{x}^{(\ell)}$ for all downstream positions $\ell$ and token positions $j=0,\dots, T-1$ using three different response metrics $\mathbf{C}^{(\ell)}_{\Delta}$, $\mathbf{C}^{(\ell)}_{\varphi}$, $\mathbf{C}^{(\ell)}_{\vartheta} \in \mathbb{R}^{T\times T}$. As a benchmark, we apply this method to repeated subsequences of length $T_0 = T/2$, consisting of random tokens sampled uniformly from the vocabulary space.
  • Figure 2: Scale-invariant response.(a, d) Response matrices $\mathbf{C}^{(\ell)}_{\Delta}$ (a) and $\mathbf{C}^{(\ell)}_{\varphi}$ (d) for a repeated sequence of $T_0 = 64$ random tokens, a weak perturbation $(\varepsilon=\varepsilon_0=0.05)$, and the last position of the residual stream $(\ell=2L)$, using a pre-trained model of Gemma-2-2B ($L=26$). (b, e) The rescaled response functions $\overline{C}{}^{(\ell)}_{\Delta} / \chi_{\Delta}$ (b) and $\overline{C}{}^{(\ell)}_{\varphi} / \chi_{\varphi}$ (e) collapse the unscaled data (insets) onto a single, $s$-independent curve with a pronounced peak at $\Delta j = T_0-1$ (dashed vertical lines) as a signature of induction behavior. (c, f) For small perturbations $(\varepsilon < 0.1)$, the ratios $\overline{C}{}^{(\ell)}_{\Delta} / \overline{C}{}^{(\ell)}_{\Delta}(\varepsilon_0, \Delta j)$ (c) and $\overline{C}{}^{(\ell)}_{\varphi} / \overline{C}{}^{(\ell)}_{\varphi}(\varepsilon_0, \Delta j)$ (f) are independent of $\Delta j$ and well-approximated by $\chi_{\Delta} \approx \varepsilon/\varepsilon_0$ (c, grey dashed line) and $\chi_{\Delta} \approx \left(\varepsilon/\varepsilon_0\right)^2$ (f, grey dashed line), thereby demonstrating the scale-invariance of the response. All data shown in (a-f) is obtained by averaging over a batch of $32$ sequences.
  • Figure 3: Scale-invariant response across model layers.(a, b, c) Evolution of the response functions $\overline{C}{}^{(\ell)}_{\Delta}$ (a, inset), $\overline{C}{}^{(\ell)}_{\varphi}$ (b, inset), and $\overline{C}{}^{(\ell)}_{\vartheta}$ (c) across the residual stream of Gemma-2-2B, for various values of the scale parameter $\varepsilon$, at a fixed value of $\Delta j = T_0 - 1$, using the same sequence data as in Fig. \ref{['Fig2']}. Rescaling with $\ell$-independent scaling functions $\chi_{\Delta}$ and $\chi{_\varphi}$, collapses $\overline{C}{}^{(\ell)}_{\Delta}$ (a) and $\overline{C}{}^{(\ell)}_{\varphi}$ (b) onto a single curve, nearly independent of the value of $\varepsilon$ over a wide range in perturbation strength. (d, e, f) Increments $\Delta C^{(\ell)}_{\Delta, \varphi, \vartheta} = \overline{C}{}^{(\ell)}_{\Delta, \varphi, \vartheta} - \overline{C}{}^{(\ell-1)}_{\Delta, \varphi, \vartheta}$ reveal the contribution of individual MHA (green bars) and MLP sublayers (blue bars) to the response functions $\overline{C}{}^{(\ell)}_{\Delta}$ (d), $\overline{C}{}^{(\ell)}_{\varphi}$ (e), and $\overline{C}{}^{(\ell)}_{\vartheta}$ (f), for each position $\ell$ in the model. The values of $\Delta C^{(\ell)}_{\Delta}$ and $\Delta C^{(\ell)}_{\varphi}$ are each normalized such that $\sum_{\ell}\Delta C_{\Delta, \varphi}^{(\ell)} = 1$. Partial sums over increments, $\Sigma = \sum_{\ell \text{ even/odd}}\Delta C_{\Delta, \varphi, \vartheta}^{(\ell)}$, reveal the total contribution of MHA ($\ell$ odd) and MLP sublayers ($\ell$ even) to the response (insets). All data shown is obtained by averaging over a batch of $32$ sequences.
  • Figure 4: Emergence of induction signatures in Gemma-2-2B.(a-f) Response functions $\overline{C}{}^{(\ell)}_{\Delta}$ (a, d), $\overline{C}{}^{(\ell)}_{\varphi}$ (b, e), and $\overline{C}{}^{(\ell)}_{\vartheta}$ (c, f), for varying values of $\Delta j$ and $\ell$, using a weak perturbation $(\varepsilon=0.05)$, and a repeated sequence of $T_0=64$ random tokens. Each row of images shown in (a, b) is normalized to the maximum value of $\overline{C}{}^{(\ell)}_{\Delta, \varphi}$ within the range of $\Delta j = 59 - 69$. As a signature for the onset of induction behavior within the model around $\ell\gtrsim 30$, we observe a shift from correlations between the same tokens $(\Delta j = T_0)$ to their previous tokens $(\Delta j = T_0 - 1)$ of the repeated subsequence. All data shown is obtained by averaging over a batch of $32$ sequences.
  • Figure 5: Emergence of induction across models.(a, b, c) The ratios $\overline{C}{}^{(\ell)}_{\Delta} / \overline{C}{}^{(\ell)}_{\Delta}(\Delta j, \varepsilon_0)$ for the last layer $(\ell = 2L)$ in GPT-2-XL (a, $L=48$), Llama-3.2-3B (b, $L=28$), and Gemma-2-2B (c, $L=26$), display the same universal scaling behavior for sufficiently weak perturbations $\varepsilon<\varepsilon_0$, where $\chi_{\Delta} \approx \varepsilon/\varepsilon_0$ (grey dashed lines), and $\varepsilon_0=0.03$ (grey vertical lines). Across all models, the relative deviation from the scaling function, $\delta=(\overline{C}{}^{(\ell)}_{\Delta} / \overline{C}{}^{(\ell)}_{\Delta}(\Delta j, \varepsilon_0) - \chi_\Delta) / \chi_\Delta$, averaged over all values of $\Delta j$, signals scale-invariant behavior over two orders of magnitude in $\varepsilon$ (insets). (d, e, f) Response functions $\overline{C}{}^{(\ell)}_{\Delta}$ for GPT-2-XL (d), Llama-3.2-3B (e), and Gemma-2-2B (f), for varying values of $\Delta j$ and $\ell$. For each value of $\ell$, $\overline{C}{}^{(\ell)}_{\Delta}$ is normalized to its maximum value within the displayed range of $\Delta j = 59 - 69$. All data shown is obtained using the same data set as in Fig. \ref{['Fig2']}-\ref{['Fig4']}.