Table of Contents
Fetching ...

In-Context Algebra

Eric Todd, Jannik Brinkmann, Rohit Gandikota, David Bau

TL;DR

The paper probes how transformers perform in-context algebra when token meanings are not fixed but are determined within each sequence by underlying group structure. By designing a task over finite groups and analyzing targeted data distributions, the study reveals that models develop symbolic mechanisms—commutative copying, identity recognition, and closure-based cancellation—and can generalize to unseen groups. Through causal interventions and subspace probes, the authors validate these mechanisms and show phase transitions marking discrete skill acquisition. The findings indicate that in-context reasoning with variable-binding tends toward symbolic strategies rather than geometric embeddings, with implications for understanding and improving reasonings in contexts where symbol meanings are context-dependent.

Abstract

We investigate the mechanisms that arise when transformers are trained to solve arithmetic on sequences where tokens are variables whose meaning is determined only through their interactions. While prior work has found that transformers develop geometric embeddings that mirror algebraic structure, those previous findings emerge from settings where arithmetic-valued tokens have fixed meanings. We devise a new task in which the assignment of symbols to specific algebraic group elements varies from one sequence to another. Despite this challenging setup, transformers achieve near-perfect accuracy on the task and even generalize to unseen algebraic groups. We develop targeted data distributions to create causal tests of a set of hypothesized mechanisms, and we isolate three mechanisms models consistently learn: commutative copying where a dedicated head copies answers, identity element recognition that distinguishes identity-containing facts, and closure-based cancellation that tracks group membership to constrain valid answers. Complementary to the geometric representations found in fixed-symbol settings, our findings show that models develop symbolic reasoning mechanisms when trained to reason in-context with variables whose meanings are not fixed.

In-Context Algebra

TL;DR

The paper probes how transformers perform in-context algebra when token meanings are not fixed but are determined within each sequence by underlying group structure. By designing a task over finite groups and analyzing targeted data distributions, the study reveals that models develop symbolic mechanisms—commutative copying, identity recognition, and closure-based cancellation—and can generalize to unseen groups. Through causal interventions and subspace probes, the authors validate these mechanisms and show phase transitions marking discrete skill acquisition. The findings indicate that in-context reasoning with variable-binding tends toward symbolic strategies rather than geometric embeddings, with implications for understanding and improving reasonings in contexts where symbol meanings are context-dependent.

Abstract

We investigate the mechanisms that arise when transformers are trained to solve arithmetic on sequences where tokens are variables whose meaning is determined only through their interactions. While prior work has found that transformers develop geometric embeddings that mirror algebraic structure, those previous findings emerge from settings where arithmetic-valued tokens have fixed meanings. We devise a new task in which the assignment of symbols to specific algebraic group elements varies from one sequence to another. Despite this challenging setup, transformers achieve near-perfect accuracy on the task and even generalize to unseen algebraic groups. We develop targeted data distributions to create causal tests of a set of hypothesized mechanisms, and we isolate three mechanisms models consistently learn: commutative copying where a dedicated head copies answers, identity element recognition that distinguishes identity-containing facts, and closure-based cancellation that tracks group membership to constrain valid answers. Complementary to the geometric representations found in fixed-symbol settings, our findings show that models develop symbolic reasoning mechanisms when trained to reason in-context with variables whose meanings are not fixed.

Paper Structure

This paper contains 25 sections, 8 equations, 18 figures, 2 tables.

Figures (18)

  • Figure 1: An overview of the data generation process. (a) Variable Assignment: We sample a set of algebraic groups and assign the elements of each group a non-overlapping set of vocabulary symbols. (b) Sequence Generation: Sampled facts are converted into variable statements via the latent mapping $\varphi_s$ and concatenated together to form a sequence. (c) Sample Diversity: Every sequence is constructed by sampling a new set of groups, defining a new latent mapping, and sampling a new string of facts. The vocabulary symbols are assigned specific meanings within individual sequences, but can take on very different meaning across sequences.
  • Figure 2: In-context algebra performance.
  • Figure 3: Algorithmic coverage: (a) the percentage of training data that can be solved by each mechanism: copying (green), commutative copying (purple), identity recognition (yellow), closure-based elimination (red), associativity (blue), compared to the empirical model performance (black). The gray shaded region represents unexplained performance. (b) Coverage of sequences where neither form of copying is possible. Identity recognition solves $28.7\%$ of the problems (yellow), closure-based cancellation can solve an additional $39.1\%$ (red) and associativity solves $16.9\%$ (blue). Model performance on hold-out sequences is shown in black. (c) The model achieves high accuracy on almost all algorithmic distributions ($97$-$100\%$), except for associative composition ($60\%$).
  • Figure 4: An analysis of copying (§ \ref{['subsec:copying-mechanism']}). Attention patterns (a-d) and direct logit contributions (e-h) of the copying head (layer 3, head 6) across variations of the same algebra sequence. (a) When verbatim copying is possible, the head attends to the answer-slot of the previous fact "$kc=f$" and (e) directly promotes that token's logit (green). (b) When the exact fact is absent, the head's attention shifts to the answer-slot and predictive token of the commutative fact "$ck=f$" and (f) promotes that token (purple). Note this fact was also in (a) but not attended to, indicating exact facts take precedence over commutative ones. (c) When both exact and commutative facts are absent, the head often self-attends and (g) no longer strongly promotes one token. (d) When injecting a matching "corrupted" fact with an incorrect answer ("$kc=j$", red), the head attends to each answer-slot and (h) promotes both variables (green, red).
  • Figure 5: Identity Recognition. (a) PCA decomposition of fact hidden states at the final attention layer reveals a clear separation of identity facts (blue) and non-identity facts (red). (b) Head 3.1 promotes the logits of both variables in the query ($a$ and $e$), while head 3.6 demotes the logit of the identity variable, $e$. (c) PCA steering on its own can induce identity behavior, but it promotes both variables in the query to have near-equal logits. Inserting a false identity fact for either query variable triggers identity demotion, which, along with PCA steering, achieves cleaner identity control.
  • ...and 13 more figures