Learning Modular Exponentiation with Transformers
David Demitri Africa, Sara M. Kapoor, Theo Simon Sorg, Challenger Mishra
TL;DR
The paper investigates how transformers learn modular exponentiation, treating interpretability as a core goal. By training a 4-layer encoder-decoder Transformer on $a^b \equiv d \pmod c$ and employing reciprocal operand sampling along with base-$B$ digit representations, the authors show robust generalization and grokking-like surges for related moduli. A key finding is that a minimal circuit composed of final-layer attention heads suffices for regular exponentiation, suggesting specialized high-level computation rather than distributed symbolic processing. These results advance mechanistic interpretability in neural arithmetic, demonstrating both concrete learning dynamics and identifiable circuits, though within a synthetic, small-scale setting and highlighting avenues for scaling to cryptographic-strength inputs.
Abstract
Modular exponentiation is crucial to number theory and cryptography, yet remains largely unexplored from a mechanistic interpretability standpoint. We train a 4-layer encoder-decoder Transformer model to perform this operation and investigate the emergence of numerical reasoning during training. Utilizing principled sampling strategies, PCA-based embedding analysis, and activation patching, we examine how number-theoretic properties are encoded within the model. We find that reciprocal operand training leads to strong performance gains, with sudden generalization across related moduli. These synchronized accuracy surges reflect grokking-like dynamics, suggesting the model internalizes shared arithmetic structure. We also find a subgraph consisting entirely of attention heads in the final layer sufficient to achieve full performance on the task of regular exponentiation. These results suggest that transformer models learn modular arithmetic through specialized computational circuits, paving the way for more interpretable and efficient neural approaches to modular exponentiation.
