Table of Contents
Fetching ...

Uncovering a Universal Abstract Algorithm for Modular Addition in Neural Networks

Gavin McCracken, Gabriela Moisescu-Pareja, Vincent Letourneau, Doina Precup, Jonathan Love

TL;DR

The paper tackles why diverse neural networks solve modular addition in similar ways, proposing a universal abstract algorithm, the approximate CRT ($\mathcal{ACRT}$), that underpins solutions across architectures. It combines a simple sinusoidal-neuron model with a formalization of approximate cosets and a CRT-inspired decomposition, proving that deep networks can instantiate the $\mathcal{ACRT}$ using $\mathcal{O}(\log n)$ frequencies and that margins grow as $\Omega(\log n)$. Empirically, it demonstrates that neurons across 1- to 4-layer MLPs and transformers align with approximate coset structure, and that depth and trainable embeddings influence frequency counts while preserving the abstract template. The work thus provides a theory-backed interpretation of modular addition in multilayer networks, offering a path toward generalizable interpretability and a testable universality hypothesis for group multiplication tasks. Overall, it unifies prior, seemingly divergent mechanisms into a single high-level algorithm with broad implications for understanding and comparing internal representations in neural networks.

Abstract

We propose a testable universality hypothesis, asserting that seemingly disparate neural network solutions observed in the simple task of modular addition are unified under a common abstract algorithm. While prior work interpreted variations in neuron-level representations as evidence for distinct algorithms, we demonstrate - through multi-level analyses spanning neurons, neuron clusters, and entire networks - that multilayer perceptrons and transformers universally implement the abstract algorithm we call the approximate Chinese Remainder Theorem. Crucially, we introduce approximate cosets and show that neurons activate exclusively on them. Furthermore, our theory works for deep neural networks (DNNs). It predicts that universally learned solutions in DNNs with trainable embeddings or more than one hidden layer require only O(log n) features, a result we empirically confirm. This work thus provides the first theory-backed interpretation of multilayer networks solving modular addition. It advances generalizable interpretability and opens a testable universality hypothesis for group multiplication beyond modular addition.

Uncovering a Universal Abstract Algorithm for Modular Addition in Neural Networks

TL;DR

The paper tackles why diverse neural networks solve modular addition in similar ways, proposing a universal abstract algorithm, the approximate CRT (), that underpins solutions across architectures. It combines a simple sinusoidal-neuron model with a formalization of approximate cosets and a CRT-inspired decomposition, proving that deep networks can instantiate the using frequencies and that margins grow as . Empirically, it demonstrates that neurons across 1- to 4-layer MLPs and transformers align with approximate coset structure, and that depth and trainable embeddings influence frequency counts while preserving the abstract template. The work thus provides a theory-backed interpretation of modular addition in multilayer networks, offering a path toward generalizable interpretability and a testable universality hypothesis for group multiplication tasks. Overall, it unifies prior, seemingly divergent mechanisms into a single high-level algorithm with broad implications for understanding and comparing internal representations in neural networks.

Abstract

We propose a testable universality hypothesis, asserting that seemingly disparate neural network solutions observed in the simple task of modular addition are unified under a common abstract algorithm. While prior work interpreted variations in neuron-level representations as evidence for distinct algorithms, we demonstrate - through multi-level analyses spanning neurons, neuron clusters, and entire networks - that multilayer perceptrons and transformers universally implement the abstract algorithm we call the approximate Chinese Remainder Theorem. Crucially, we introduce approximate cosets and show that neurons activate exclusively on them. Furthermore, our theory works for deep neural networks (DNNs). It predicts that universally learned solutions in DNNs with trainable embeddings or more than one hidden layer require only O(log n) features, a result we empirically confirm. This work thus provides the first theory-backed interpretation of multilayer networks solving modular addition. It advances generalizable interpretability and opens a testable universality hypothesis for group multiplication beyond modular addition.

Paper Structure

This paper contains 46 sections, 5 theorems, 24 equations, 46 figures, 6 tables, 1 algorithm.

Key Result

Theorem 4.4

Simple neurons in layer 1 activate (ReLU $>0$) on an approximate coset containing the correct answer $c$, by concentrating their preactivations on approximate cosets that contain $a$ and $b$; all neurons in later hidden layers activate on linear combinations of approximate cosets.

Figures (46)

  • Figure 1: Preactivation values over $a$ fixing $b=5$ on $c=(a+b)\bmod 59$ of a neuron from an MLP, pizza and clock show qualitative equivalence after remapping (Def. \ref{['def:freq-normalization']}): they all have frequency 1.
  • Figure 2: Visualizing how neurons learn approximate coset structure. Panel 1 shows the circle graph on 66 elements generated by starting at $a=0$ and taking 6 steps of $\pm11$, creating the $\frac{66}{11}=6$ cosets of points $\{a \pmod 6 \equiv 0\}, \{a\pmod 6 \equiv 1\},\dots,\{a\pmod 6\equiv5\}$. The graph distance to each coset from coset $\{a\pmod 6 \equiv 0\}$ (in yellow) is given. 2: the neuron learned $\cos(\frac{11(2\pi)a}{66})$; the distances annotated on points follow from 1. This neuron only activates (ReLU $>0$) on distances 0 and 1. 3: remapping shows all members of each coset collapse into an equivalence class. Panels 4-6 show the circle graph on 67 elements generated by $\pm11$; since $\gcd(11,67) = 1$, the neuron can't activate at the same strength on equivalent points (cosets) and instead activates with strengths proportional to distances on the Cayley graph. All elements the neuron takes positive values on are an approximate coset, shown in bright viridis colors decaying with distance.
  • Figure 3: The number of frequencies found in clocks, pizzas, and MLPs as the modulus $n$ increases. We plot the data on logarithmic and linear axes, showing logarithmic fits have very high $R^2$ scores.
  • Figure 4: Average number of frequencies found in MLPs over moduli 59-66.
  • Figure 5: Left: cluster preactivations from a clock with small but present secondary spikes in the DFT. Right: as the width of the models is increased, the presence of the secondary spikes fades, 2 sines is fitting a sum of 2 different $f$ sines, allowing the inclusion of a secondary peak in the fit.
  • ...and 41 more figures

Theorems & Definitions (28)

  • Definition 4.1: Step size
  • Definition 4.2: Remapping: frequency normalization
  • Definition 4.3: Approximate cosets
  • Theorem 4.4
  • Remark 4.5
  • Theorem 4.7
  • Corollary 4.8
  • Conjecture 4.9
  • Definition A.1: Group
  • Definition A.2: Subgroup
  • ...and 18 more