Testing Transformer Learnability on the Arithmetic Sequence of Rooted Trees
Alessandro Breccia, Federica Gerace, Marco Lippi, Gabriele Sicuro, Pierluigi Contucci
TL;DR
The paper investigates whether a GPT‑2–style transformer can learn the deterministic arithmetic text mathds{N}mathcal{T}, derived from rooted‑tree prime factorizations encoded as Dyck words. It trains a 12‑layer GPT‑2 model on the first {10}^{11} integers with two self‑supervised tasks: Next‑Word Prediction and Masked Language Modeling, and compares against a Markov baseline. Results show partial learning of the internal grammar, with the model outperforming baselines and capturing non‑trivial regularities, though prime boundaries remain challenging due to long‑range structure beyond the context window. The work suggests that arithmetic structure can be learned to an extent by transformers and motivates larger models to explore global reasoning and latent representations of number theory.
Abstract
We study whether a Large Language Model can learn the deterministic sequence of trees generated by the iterated prime factorization of the natural numbers. Each integer is mapped into a rooted planar tree and the resulting sequence $ \mathbb{N}\mathcal{T}$ defines an arithmetic text with measurable statistical structure. A transformer network (the GPT-2 architecture) is trained from scratch on the first $10^{11}$ elements to subsequently test its predictive ability under next-word and masked-word prediction tasks. Our results show that the model partially learns the internal grammar of $\mathbb{N}\mathcal{T}$, capturing non-trivial regularities and correlations. This suggests that learnability may extend beyond empirical data to the very structure of arithmetic.
