Table of Contents
Fetching ...

Learning to Add, Multiply, and Execute Algorithmic Instructions Exactly with Neural Networks

Artur Back de Luca, George Giapitzakis, Kimon Fountoulakis

TL;DR

This work investigates whether neural networks can learn to execute discrete binary algorithms exactly by analyzing two-layer networks in the NTK regime. It introduces a template-matching framework that encodes local bitwise rules into a block-structured training set, enabling exact algorithmic execution (permutations, binary addition, binary multiplication, and SBN instructions) with logarithmically many examples via an ensemble of infinite-width networks. The authors prove NTK-based exact learnability under controlled interference, and extend the results to high-probability guarantees using ensemble averaging, establishing ensemble size bounds that scale polynomially with bit-length for the studied tasks. The work also discusses limitations (orthogonality assumptions, bounded memory) and outlines future directions toward architectures capable of handling longer or variable-length inputs, such as RNNs, Transformers, or GNNs, while preserving the theoretical framework. Overall, the paper provides formal guarantees for exact neural execution of fundamental algorithms in a controlled NTK setting, offering insight into how discrete computations can be embedded and learned in neural systems with provable properties.

Abstract

Neural networks are known for their ability to approximate smooth functions, yet they fail to generalize perfectly to unseen inputs when trained on discrete operations. Such operations lie at the heart of algorithmic tasks such as arithmetic, which is often used as a test bed for algorithmic execution in neural networks. In this work, we ask: can neural networks learn to execute binary-encoded algorithmic instructions exactly? We use the Neural Tangent Kernel (NTK) framework to study the training dynamics of two-layer fully connected networks in the infinite-width limit and show how a sufficiently large ensemble of such models can be trained to execute exactly, with high probability, four fundamental tasks: binary permutations, binary addition, binary multiplication, and Subtract and Branch if Negative (SBN) instructions. Since SBN is Turing-complete, our framework extends to computable functions. We show how this can be efficiently achieved using only logarithmically many training data. Our approach relies on two techniques: structuring the training data to isolate bit-level rules, and controlling correlations in the NTK regime to align model predictions with the target algorithmic executions.

Learning to Add, Multiply, and Execute Algorithmic Instructions Exactly with Neural Networks

TL;DR

This work investigates whether neural networks can learn to execute discrete binary algorithms exactly by analyzing two-layer networks in the NTK regime. It introduces a template-matching framework that encodes local bitwise rules into a block-structured training set, enabling exact algorithmic execution (permutations, binary addition, binary multiplication, and SBN instructions) with logarithmically many examples via an ensemble of infinite-width networks. The authors prove NTK-based exact learnability under controlled interference, and extend the results to high-probability guarantees using ensemble averaging, establishing ensemble size bounds that scale polynomially with bit-length for the studied tasks. The work also discusses limitations (orthogonality assumptions, bounded memory) and outlines future directions toward architectures capable of handling longer or variable-length inputs, such as RNNs, Transformers, or GNNs, while preserving the theoretical framework. Overall, the paper provides formal guarantees for exact neural execution of fundamental algorithms in a controlled NTK setting, offering insight into how discrete computations can be embedded and learned in neural systems with provable properties.

Abstract

Neural networks are known for their ability to approximate smooth functions, yet they fail to generalize perfectly to unseen inputs when trained on discrete operations. Such operations lie at the heart of algorithmic tasks such as arithmetic, which is often used as a test bed for algorithmic execution in neural networks. In this work, we ask: can neural networks learn to execute binary-encoded algorithmic instructions exactly? We use the Neural Tangent Kernel (NTK) framework to study the training dynamics of two-layer fully connected networks in the infinite-width limit and show how a sufficiently large ensemble of such models can be trained to execute exactly, with high probability, four fundamental tasks: binary permutations, binary addition, binary multiplication, and Subtract and Branch if Negative (SBN) instructions. Since SBN is Turing-complete, our framework extends to computable functions. We show how this can be efficiently achieved using only logarithmically many training data. Our approach relies on two techniques: structuring the training data to isolate bit-level rules, and controlling correlations in the NTK regime to align model predictions with the target algorithmic executions.

Paper Structure

This paper contains 31 sections, 6 theorems, 68 equations, 7 figures, 1 algorithm.

Key Result

Theorem 3.1

Let $\mathcal{X}$ and $\mathcal{Y}$ be the training dataset (training inputs and ground truth labels, respectively). Assume that $\Theta:=\Theta(\mathcal{X},\mathcal{X})$ is positive definite. Suppose the network is trained with gradient descent (with small-enough step-size) or gradient flow to mini where $\mathcal{Y}$ in eq:mean_out denotes the vectorization of all vectors $\boldsymbol{y} \in \ma

Figures (7)

  • Figure 1: Simplified illustration of the framework used in our analysis. The left panel shows an example algorithm (binary addition) where each function, highlighted in blue and red, is translated into binary training instructions shown in the central panel with matching colors. Each instruction specifies a condition over part of the current algorithm state and maps it to a corresponding output. Instructions are grouped into blocks, indicated by boxed column labels in $\mathcal{X}$ and $\mathcal{Y}$, each representing a subset of the input state. For binary addition, some blocks represent segments of the summands, while others reflect the carry state. For the applications discussed in \ref{['sec:behavior']}, this block structure allows the number of instructions in $\mathcal{X}$ and $\mathcal{Y}$ to scale linearly with bit length $\ell$. The right-most panel shows how instructions are used within an iterative framework to update the state vector $\hat{\boldsymbol{x}}_i$, which serves as input to the neural network at the $i$-th step. The state is first encoded, as described in \ref{['sec:ntk_learnability']}, before being passed to the model. In the NTK regime, we show that the network performs template matching against training samples to execute the appropriate instructions. As $\hat{\boldsymbol{x}}_i$ evolves, it activates new templates, progressing through the algorithm. Predictions are rounded at each step to mitigate noise, and repeating this process reproduces the algorithm’s full execution.
  • Figure 2: Illustration of the addition algorithm based on the template matching approach from \ref{['sec:instructions']}. Two $\ell=2$ bit numbers, $p=2 \ (\textrm{or } 10_2)$ and $q=3 \ (\textrm{or } 11_2)$, are added by organizing their bits and carries into blocks $B_i$. Blocks $B_2$ and $B_4$ represent the bits of $p$ and $q$, while $B_1$ and $B_3$ handle the carries. The input $\hat{\boldsymbol{x}}$ is processed via template matching $f$, using templates $\mathcal{T}_i$, producing outputs $y^{(k)}_i$ used to compose the output. Although the method is iterative, this example completes in one step. The final result $5 \ (\textrm{or } 101_2)$ is stored at the most-significant carry bit and the bits of $p$ in $\hat{\boldsymbol{x}}$.
  • Figure 3: Visualization of the input specification of \ref{['subseq:input']} for binary summation of two $\ell=2$ bit numbers. On the left, we illustrate the block structure of a pre-encoded test sample. Each block should either be zero or match the corresponding block of an element (row) of $\mathcal{X}_{\text{init}}$. On the left, we showcase the encoding procedure that creates the training dataset. The initial examples (described in \ref{['sec:instructions']}) forming the rows of $\mathcal{X}_\text{init}$ are augmented and orthogonalized. Notice that the colored parts of each row of $\mathcal{X}_\textrm{init}$ along with the corresponding row of $\mathcal{Y}$ match the $\mathcal{T}_i$'s of \ref{['fig:addition']}. Also note that the orthogonalization presented here is only one of the many possible ones. Finally, each row of $\mathcal{Y}$ depicts the corresponding ground-truth output for each training sample.
  • Figure 4: Illustration of the NTK predictor structure: inputs are first encoded (normalization omitted) to compute the test NTK $\Theta(\hat{\boldsymbol{x}}, \mathcal{X})$. Due to the test input structure, this kernel assumes two values based on matches between test and training inputs within blocks. Multiplying by $\Tilde{\Theta}^{-1}$ (which assumes the form of scaled identity plus a rank-1 noise perturbation colored black) re-weights these similarities, and the multiplication by $\mathcal{Y}$ gives the final prediction. When the contribution of the unmatched entries is controlled (black similarities), the sign of each coordinate matches the ground-truth output.
  • Figure 5: Numerical and theoretical estimates of ensemble complexity $N$ (in log-scale) for permutation, addition, and multiplication tasks as a function of bit length $\ell$. Ensemble complexity is computed via a union bound over all possible inputs and algorithmic executions for a given $\ell$. Inset blocks illustrate the ratio of variance to mean in \ref{['eq:ensemble_uniform']}, estimated using input size $k'$. This ratio increases linearly with $k'$ up to a constant. The same ratio is used in the theoretical estimate of $N$, which matches the numerical estimate in growth rate, differing only by a constant factor.
  • ...and 2 more figures

Theorems & Definitions (12)

  • Theorem 3.1: Theorem 2.2 from NEURIPS2019_0d1a9651
  • Theorem 5.1: NTK predictor behavior
  • proof : Proof outline
  • Remark 5.1
  • Lemma 6.1
  • Remark 6.1
  • Theorem A.1: sherman
  • Lemma A.1
  • proof
  • Theorem C.1: NTK predictor behavior
  • ...and 2 more