Understanding Addition and Subtraction in Transformers
Philip Quirke, Clement Neo, Fazl Barez
TL;DR
The paper demonstrates that small transformers trained from scratch can implement exact $n$-digit addition and subtraction using interpretable cascading carry/borrow circuits, achieving 99.999% accuracy on millions of questions. It introduces a formal mathematical framework with subtasks, presents addition, subtraction, and mixed algorithms, and validates them via ablations and targeted interventions across 49 models. An automated interpretability toolkit and a reproducible workflow enable precise node-level verification of the algorithmic structure. A broad survey of 180 public LLMs shows only about 7% reliably perform addition, underscoring a gap between specialized tiny models and production LLMs. The work suggests that exact arithmetic can be realized in compact transformers and offers a tractable mechanistic interpretability case with potential implications for understanding and improving arithmetic in larger models.
Abstract
Transformers are widely deployed in large language models (LLMs), yet most models still fail on basic arithmetic tasks such as multidigit addition. In contrast, we show that small transformers trained from scratch can solve n-digit addition and subtraction with 99.999% accuracy. Building directly on prior work that uncovered addition circuits, we extend the analysis to subtraction and present a unified mechanistic account based on cascading carry and borrow circuits. Using a suite of 49 trained models, we apply systematic ablations and node-level constraints to validate the learned mechanisms and release a reproducible interpretability toolkit for studying arithmetic circuits. Finally, surveying 180 publicly available LLMs, we find that only 7% can reliably perform addition, underscoring the gap between specialized small models and general-purpose LLMs. Our results show that arithmetic can be implemented exactly by tiny transformers, offering a tractable case study for mechanistic interpretability and a cautionary contrast with the persistent arithmetic failures of much larger models.
