Table of Contents
Fetching ...

IntSeqBERT: Learning Arithmetic Structure in OEIS via Modulo-Spectrum Embeddings

Kazuhisa Nakasho

TL;DR

Modulo spectrum analysis reveals a strong negative correlation between Normalised Information Gain (NIG) and Euler's totient ratio $\varphi(m)/m$ ($r = -0.851), providing empirical evidence that composite moduli capture OEIS arithmetic structure more efficiently via CRT aggregation.

Abstract

Integer sequences in the OEIS span values from single-digit constants to astronomical factorials and exponentials, making prediction challenging for standard tokenised models that cannot handle out-of-vocabulary values or exploit periodic arithmetic structure. We present IntSeqBERT, a dual-stream Transformer encoder for masked integer-sequence modelling on OEIS. Each sequence element is encoded along two complementary axes: a continuous log-scale magnitude embedding and sin/cos modulo embeddings for 100 residues (moduli $2$--$101$), fused via FiLM. Three prediction heads (magnitude regression, sign classification, and modulo prediction for 100 moduli) are trained jointly on 274,705 OEIS sequences. At the Large scale (91.5M parameters), IntSeqBERT achieves 95.85% magnitude accuracy and 50.38% Mean Modulo Accuracy (MMA) on the test set, outperforming a standard tokenised Transformer baseline by $+8.9$ pt and $+4.5$ pt, respectively. An ablation removing the modulo stream confirms it accounts for $+15.2$ pt of the MMA gain and contributes an additional $+6.2$ pt to magnitude accuracy. A probabilistic Chinese Remainder Theorem (CRT)-based Solver converts the model's predictions into concrete integers, yielding a 7.4-fold improvement in next-term prediction over the tokenised-Transformer baseline (Top-1: 19.09% vs. 2.59%). Modulo spectrum analysis reveals a strong negative correlation between Normalised Information Gain (NIG) and Euler's totient ratio $\varphi(m)/m$ ($r = -0.851$, $p < 10^{-28}$), providing empirical evidence that composite moduli capture OEIS arithmetic structure more efficiently via CRT aggregation.

IntSeqBERT: Learning Arithmetic Structure in OEIS via Modulo-Spectrum Embeddings

TL;DR

Modulo spectrum analysis reveals a strong negative correlation between Normalised Information Gain (NIG) and Euler's totient ratio ($r = -0.851), providing empirical evidence that composite moduli capture OEIS arithmetic structure more efficiently via CRT aggregation.

Abstract

Integer sequences in the OEIS span values from single-digit constants to astronomical factorials and exponentials, making prediction challenging for standard tokenised models that cannot handle out-of-vocabulary values or exploit periodic arithmetic structure. We present IntSeqBERT, a dual-stream Transformer encoder for masked integer-sequence modelling on OEIS. Each sequence element is encoded along two complementary axes: a continuous log-scale magnitude embedding and sin/cos modulo embeddings for 100 residues (moduli --), fused via FiLM. Three prediction heads (magnitude regression, sign classification, and modulo prediction for 100 moduli) are trained jointly on 274,705 OEIS sequences. At the Large scale (91.5M parameters), IntSeqBERT achieves 95.85% magnitude accuracy and 50.38% Mean Modulo Accuracy (MMA) on the test set, outperforming a standard tokenised Transformer baseline by pt and pt, respectively. An ablation removing the modulo stream confirms it accounts for pt of the MMA gain and contributes an additional pt to magnitude accuracy. A probabilistic Chinese Remainder Theorem (CRT)-based Solver converts the model's predictions into concrete integers, yielding a 7.4-fold improvement in next-term prediction over the tokenised-Transformer baseline (Top-1: 19.09% vs. 2.59%). Modulo spectrum analysis reveals a strong negative correlation between Normalised Information Gain (NIG) and Euler's totient ratio (, ), providing empirical evidence that composite moduli capture OEIS arithmetic structure more efficiently via CRT aggregation.
Paper Structure (27 sections, 4 equations, 6 figures, 10 tables)

This paper contains 27 sections, 4 equations, 6 figures, 10 tables.

Figures (6)

  • Figure 1: IntSeqBERT architecture. The Dual-Stream Embedding block projects $\mathbf{f}_i^{\mathrm{mag}}\!\in\!\mathbb{R}^4$ and $\mathbf{f}_i^{\mathrm{mod}}\!\in\!\mathbb{R}^{200}$ to $\mathbb{R}^d$ via $\mathrm{MLP}_{\mathrm{mag}}$ and $W_{\mathrm{mod}}$, then fuses them with FiLM: $\mathbf{e}_i = (1+\boldsymbol{\gamma}_i)\odot\mathbf{h}_i^{\mathrm{mag}}+\boldsymbol{\beta}_i$. Positional encodings are added before the Pre-LN Transformer encoder. Three prediction heads produce $\hat{v},\log\hat{\sigma}^2\!\in\!\mathbb{R}$ (magnitude regression), $\hat{s}\!\in\!\{+,-,0\}$ (sign classification), and $\hat{r}^{(m)}\!\in\!\{0,\ldots,m{-}1\}$ ($100\times$ modulo classification).
  • Figure 2: Validation loss learning curves for all scales (Small / Middle / Large) and all variants. IntSeqBERT (solid blue) consistently achieves lower loss than Vanilla (dashed orange) and Ablation (dash-dot green). At the Large scale, IntSeqBERT converges to Val Loss = 1.01 at epoch 200.
  • Figure 3: Predicted vs. true magnitude ($\log_{10}$ scale) for Large models. Points are coloured by bucket (Small=blue circle, Medium=green circle, Large=yellow-orange square, Huge=red triangle, Astronomical=purple diamond). IntSeqBERT achieves $R^2 = 0.988$ vs. Vanilla $R^2 = 0.943$; Vanilla shows pronounced scatter above the Large bucket.
  • Figure 4: NIG spectrum for moduli $m = 2, \ldots, 101$ (Large models). Grey shading marks prime moduli. The 95% CI for IntSeqBERT (light blue band) is computed by bootstrapping.
  • Figure 5: NIG vs. Euler's totient ratio $\varphi(m)/m$ (Large IntSeqBERT). Composite moduli (blue circles, shade proportional to $m$) and prime moduli (red triangles) are shown separately. The regression line (grey dashed) indicates Pearson correlation of $r = -0.851$ ($p < 10^{-28}$). Notable moduli $m = 2$ (parity), $m = 60$ (Babylonian number), and $m = 96$ (highly composite) are annotated.
  • ...and 1 more figures