Uncovering a Universal Abstract Algorithm for Modular Addition in Neural Networks
Gavin McCracken, Gabriela Moisescu-Pareja, Vincent Letourneau, Doina Precup, Jonathan Love
TL;DR
The paper tackles why diverse neural networks solve modular addition in similar ways, proposing a universal abstract algorithm, the approximate CRT ($\mathcal{ACRT}$), that underpins solutions across architectures. It combines a simple sinusoidal-neuron model with a formalization of approximate cosets and a CRT-inspired decomposition, proving that deep networks can instantiate the $\mathcal{ACRT}$ using $\mathcal{O}(\log n)$ frequencies and that margins grow as $\Omega(\log n)$. Empirically, it demonstrates that neurons across 1- to 4-layer MLPs and transformers align with approximate coset structure, and that depth and trainable embeddings influence frequency counts while preserving the abstract template. The work thus provides a theory-backed interpretation of modular addition in multilayer networks, offering a path toward generalizable interpretability and a testable universality hypothesis for group multiplication tasks. Overall, it unifies prior, seemingly divergent mechanisms into a single high-level algorithm with broad implications for understanding and comparing internal representations in neural networks.
Abstract
We propose a testable universality hypothesis, asserting that seemingly disparate neural network solutions observed in the simple task of modular addition are unified under a common abstract algorithm. While prior work interpreted variations in neuron-level representations as evidence for distinct algorithms, we demonstrate - through multi-level analyses spanning neurons, neuron clusters, and entire networks - that multilayer perceptrons and transformers universally implement the abstract algorithm we call the approximate Chinese Remainder Theorem. Crucially, we introduce approximate cosets and show that neurons activate exclusively on them. Furthermore, our theory works for deep neural networks (DNNs). It predicts that universally learned solutions in DNNs with trainable embeddings or more than one hidden layer require only O(log n) features, a result we empirically confirm. This work thus provides the first theory-backed interpretation of multilayer networks solving modular addition. It advances generalizable interpretability and opens a testable universality hypothesis for group multiplication beyond modular addition.
