Table of Contents
Fetching ...

Exact Learning of Arithmetic with Differentiable Agents

Hristo Papazov, Francesco D'Angelo, Nicolas Flammarion

TL;DR

This work introduces Differentiable Finite-State Transducers (DFSTs) as a gradient-friendly, Turing-complete class that learns exact algorithmic behavior from policy-trajectory observations of expert grid agents. By training DFSTs to perform binary and decimal addition and multiplication, the authors demonstrate robust length generalization from tiny datasets, far exceeding training input lengths. The framework unifies a formalized learning environment with end-to-end differentiable training and establishes universality results, suggesting a feasible path toward exact gradient-based learning of algorithms. The findings highlight the importance of structured intermediate supervision and environment interaction for scaling to arbitrarily long inputs.

Abstract

We explore the possibility of exact algorithmic learning with gradient-based methods and introduce a differentiable framework capable of strong length generalization on arithmetic tasks. Our approach centers on Differentiable Finite-State Transducers (DFSTs), a Turing-complete model family that avoids the pitfalls of prior architectures by enabling constant-precision, constant-time generation, and end-to-end log-parallel differentiable training. Leveraging policy-trajectory observations from expert agents, we train DFSTs to perform binary and decimal addition and multiplication. Remarkably, models trained on tiny datasets generalize without error to inputs thousands of times longer than the training examples. These results show that training differentiable agents on structured intermediate supervision could pave the way towards exact gradient-based learning of algorithmic skills. Code available at \href{https://github.com/dngfra/differentiable-exact-algorithmic-learner.git}{https://github.com/dngfra/differentiable-exact-algorithmic-learner.git}.

Exact Learning of Arithmetic with Differentiable Agents

TL;DR

This work introduces Differentiable Finite-State Transducers (DFSTs) as a gradient-friendly, Turing-complete class that learns exact algorithmic behavior from policy-trajectory observations of expert grid agents. By training DFSTs to perform binary and decimal addition and multiplication, the authors demonstrate robust length generalization from tiny datasets, far exceeding training input lengths. The framework unifies a formalized learning environment with end-to-end differentiable training and establishes universality results, suggesting a feasible path toward exact gradient-based learning of algorithms. The findings highlight the importance of structured intermediate supervision and environment interaction for scaling to arbitrarily long inputs.

Abstract

We explore the possibility of exact algorithmic learning with gradient-based methods and introduce a differentiable framework capable of strong length generalization on arithmetic tasks. Our approach centers on Differentiable Finite-State Transducers (DFSTs), a Turing-complete model family that avoids the pitfalls of prior architectures by enabling constant-precision, constant-time generation, and end-to-end log-parallel differentiable training. Leveraging policy-trajectory observations from expert agents, we train DFSTs to perform binary and decimal addition and multiplication. Remarkably, models trained on tiny datasets generalize without error to inputs thousands of times longer than the training examples. These results show that training differentiable agents on structured intermediate supervision could pave the way towards exact gradient-based learning of algorithmic skills. Code available at \href{https://github.com/dngfra/differentiable-exact-algorithmic-learner.git}{https://github.com/dngfra/differentiable-exact-algorithmic-learner.git}.

Paper Structure

This paper contains 5 sections, 1 theorem, 1 equation, 5 figures, 1 table.

Key Result

theorem 2.1

The map $\psi_p : \Delta_p \to {\mathcal{G}}$ is surjective. In particular, for any precision $p \in {\mathbb{N}}$, the DFST family $\Delta_p$ is Turing-complete when interacting with an external symbolic grid ${\mathcal{U}} ( {\mathbb{Z}} ^2, \Sigma)$. Moreover, any $d-$state grid agent admits em

Figures (5)

  • Figure 1: A DFST agent interacting with an external environment.
  • Figure 2: Binary addition: Training loss (left) and RLG (robust length generalization, right) across training iterations.
  • Figure 3: Decimal addition: Training loss (left) and RLG (robust length generalization, right) across training iterations.
  • Figure 4: Binary Multiplication: Training loss (left) and RLG (robust length generalization, right) across training iterations.
  • Figure 5: Decimal Multiplication: Training loss (left) and RLG (robust length generalization, right) across training iterations.

Theorems & Definitions (4)

  • definition 2.1: Symbolic Universe
  • definition 2.2: Grid Agent
  • definition 2.3: Differentiable Finite-State Transducer Agent
  • theorem 2.1: Universality of DFSTs