Table of Contents
Fetching ...

Parallelizable Neural Turing Machines

Gabriel Faria, Arnaldo Candido Junior

TL;DR

A parallelizable simplification of Neural Turing Machine, referred to as P-NTM, is introduced, which redesigns the core operations of the original architecture to enable efficient scan-based parallel execution and achieves length generalization performance comparable to the original.

Abstract

We introduce a parallelizable simplification of Neural Turing Machine (NTM), referred to as P-NTM, which redesigns the core operations of the original architecture to enable efficient scan-based parallel execution. We evaluate the proposed architecture on a synthetic benchmark of algorithmic problems involving state tracking, memorization, and basic arithmetic, solved via autoregressive decoding. We compare it against a revisited stable implementation of the standard NTM, as well as conventional recurrent and attention-based architectures. Results show that, despite its simplifications, the proposed model attains length generalization performance comparable to the original, learning to solve all problems, including unseen sequence lengths, with perfect accuracy. It also improves training efficiency, with parallel execution of P-NTM being up to an order of magnitude faster than the standard NTM. Ultimately, this work contributes toward the development of efficient neural architectures capable of expressing a broad class of algorithms.

Parallelizable Neural Turing Machines

TL;DR

A parallelizable simplification of Neural Turing Machine, referred to as P-NTM, is introduced, which redesigns the core operations of the original architecture to enable efficient scan-based parallel execution and achieves length generalization performance comparable to the original.

Abstract

We introduce a parallelizable simplification of Neural Turing Machine (NTM), referred to as P-NTM, which redesigns the core operations of the original architecture to enable efficient scan-based parallel execution. We evaluate the proposed architecture on a synthetic benchmark of algorithmic problems involving state tracking, memorization, and basic arithmetic, solved via autoregressive decoding. We compare it against a revisited stable implementation of the standard NTM, as well as conventional recurrent and attention-based architectures. Results show that, despite its simplifications, the proposed model attains length generalization performance comparable to the original, learning to solve all problems, including unseen sequence lengths, with perfect accuracy. It also improves training efficiency, with parallel execution of P-NTM being up to an order of magnitude faster than the standard NTM. Ultimately, this work contributes toward the development of efficient neural architectures capable of expressing a broad class of algorithms.
Paper Structure (70 sections, 26 equations, 8 figures, 5 tables, 3 algorithms)

This paper contains 70 sections, 26 equations, 8 figures, 5 tables, 3 algorithms.

Figures (8)

  • Figure 1: Illustration of the NTM architecture. Adapted from graves2014ntm.
  • Figure 2: Illustration of the parallel computation of P-NTM.
  • Figure 3: Exact match accuracy scores across runs for different architectures and tasks on problems of unseen length ($\ell = 41$ to $\ell = 120$).
  • Figure 4: Speedup of P-NTM sequential and parallel execution relative to the standard NTM execution across input batches of varying sequence lengths.
  • Figure 5: High-level structure shared by all evaluated models. An encoder maps input tokens to embeddings, a sequence-processing module transforms them, and a decoder produces vocabulary logits.
  • ...and 3 more figures