Table of Contents
Fetching ...

Understanding Addition and Subtraction in Transformers

Philip Quirke, Clement Neo, Fazl Barez

TL;DR

The paper demonstrates that small transformers trained from scratch can implement exact $n$-digit addition and subtraction using interpretable cascading carry/borrow circuits, achieving 99.999% accuracy on millions of questions. It introduces a formal mathematical framework with subtasks, presents addition, subtraction, and mixed algorithms, and validates them via ablations and targeted interventions across 49 models. An automated interpretability toolkit and a reproducible workflow enable precise node-level verification of the algorithmic structure. A broad survey of 180 public LLMs shows only about 7% reliably perform addition, underscoring a gap between specialized tiny models and production LLMs. The work suggests that exact arithmetic can be realized in compact transformers and offers a tractable mechanistic interpretability case with potential implications for understanding and improving arithmetic in larger models.

Abstract

Transformers are widely deployed in large language models (LLMs), yet most models still fail on basic arithmetic tasks such as multidigit addition. In contrast, we show that small transformers trained from scratch can solve n-digit addition and subtraction with 99.999% accuracy. Building directly on prior work that uncovered addition circuits, we extend the analysis to subtraction and present a unified mechanistic account based on cascading carry and borrow circuits. Using a suite of 49 trained models, we apply systematic ablations and node-level constraints to validate the learned mechanisms and release a reproducible interpretability toolkit for studying arithmetic circuits. Finally, surveying 180 publicly available LLMs, we find that only 7% can reliably perform addition, underscoring the gap between specialized small models and general-purpose LLMs. Our results show that arithmetic can be implemented exactly by tiny transformers, offering a tractable case study for mechanistic interpretability and a cautionary contrast with the persistent arithmetic failures of much larger models.

Understanding Addition and Subtraction in Transformers

TL;DR

The paper demonstrates that small transformers trained from scratch can implement exact -digit addition and subtraction using interpretable cascading carry/borrow circuits, achieving 99.999% accuracy on millions of questions. It introduces a formal mathematical framework with subtasks, presents addition, subtraction, and mixed algorithms, and validates them via ablations and targeted interventions across 49 models. An automated interpretability toolkit and a reproducible workflow enable precise node-level verification of the algorithmic structure. A broad survey of 180 public LLMs shows only about 7% reliably perform addition, underscoring a gap between specialized tiny models and production LLMs. The work suggests that exact arithmetic can be realized in compact transformers and offers a tractable mechanistic interpretability case with potential implications for understanding and improving arithmetic in larger models.

Abstract

Transformers are widely deployed in large language models (LLMs), yet most models still fail on basic arithmetic tasks such as multidigit addition. In contrast, we show that small transformers trained from scratch can solve n-digit addition and subtraction with 99.999% accuracy. Building directly on prior work that uncovered addition circuits, we extend the analysis to subtraction and present a unified mechanistic account based on cascading carry and borrow circuits. Using a suite of 49 trained models, we apply systematic ablations and node-level constraints to validate the learned mechanisms and release a reproducible interpretability toolkit for studying arithmetic circuits. Finally, surveying 180 publicly available LLMs, we find that only 7% can reliably perform addition, underscoring the gap between specialized small models and general-purpose LLMs. Our results show that arithmetic can be implemented exactly by tiny transformers, offering a tractable case study for mechanistic interpretability and a cautionary contrast with the persistent arithmetic failures of much larger models.
Paper Structure (30 sections, 5 equations, 18 figures, 8 tables)

This paper contains 30 sections, 5 equations, 18 figures, 8 tables.

Figures (18)

  • Figure 1: Our n-digit addition algorithm is mathematically sound. It uses 4 features: It calculates single-digit carry-one values $(ST_n$), combining them into multidigit carry-one values $(SV_n$). Any $SV_n$ uncertain (U) values are refined to 0 or 1 over tokens (highlighted). By the “+" token, $SV2$ gives the A3 value as 0 or 1. The other answer digits $A_{n}$$$ are calculated from $SA_n$ and $SV_{n-1}$ values.
  • Figure 2: We score the addition and subtraction capability of 180 public LLMs. A score of 5 means the LLM handled two 5 digits numbers correctly but failed with 6 digit numbers. 7% of addition and 12% of subtraction models get the maximum test score 15. The top models can call external tools.
  • Figure 3: For 5-digit addition and subtraction, our notation for the main input tokens is $D4$, ..., $D0$ and $D'4$, ..., $D'0$. For output tokens it is $A6$, .., $A0$. For n-digits, we use the notation $D_{n}$, .., $D0$, $D'_{n}$, .., $D'0$ and $A_{n}$, .., $A0$.
  • Figure 4: Our subtraction algorithm parallels our addition algorithm but it refines “cascading borrow one" uncertainty over multiple tokens. The refined $MV2$ value determines both the answer sign (“+" or “-") and which digit values ($MD_n$ or $ND_n$) to use for the final output.
  • Figure 5: The 5- to 15-digit 2-layer 3-head addition models have very low loss. With more digits training takes longer. Details in Tab.\ref{['tab:AdditionModels']}.
  • ...and 13 more figures