In-Context Algebra
Eric Todd, Jannik Brinkmann, Rohit Gandikota, David Bau
TL;DR
The paper probes how transformers perform in-context algebra when token meanings are not fixed but are determined within each sequence by underlying group structure. By designing a task over finite groups and analyzing targeted data distributions, the study reveals that models develop symbolic mechanisms—commutative copying, identity recognition, and closure-based cancellation—and can generalize to unseen groups. Through causal interventions and subspace probes, the authors validate these mechanisms and show phase transitions marking discrete skill acquisition. The findings indicate that in-context reasoning with variable-binding tends toward symbolic strategies rather than geometric embeddings, with implications for understanding and improving reasonings in contexts where symbol meanings are context-dependent.
Abstract
We investigate the mechanisms that arise when transformers are trained to solve arithmetic on sequences where tokens are variables whose meaning is determined only through their interactions. While prior work has found that transformers develop geometric embeddings that mirror algebraic structure, those previous findings emerge from settings where arithmetic-valued tokens have fixed meanings. We devise a new task in which the assignment of symbols to specific algebraic group elements varies from one sequence to another. Despite this challenging setup, transformers achieve near-perfect accuracy on the task and even generalize to unseen algebraic groups. We develop targeted data distributions to create causal tests of a set of hypothesized mechanisms, and we isolate three mechanisms models consistently learn: commutative copying where a dedicated head copies answers, identity element recognition that distinguishes identity-containing facts, and closure-based cancellation that tracks group membership to constrain valid answers. Complementary to the geometric representations found in fixed-symbol settings, our findings show that models develop symbolic reasoning mechanisms when trained to reason in-context with variables whose meanings are not fixed.
