Table of Contents
Fetching ...

Structured Memory for Neural Turing Machines

Wei Zhang, Yang Yu, Bowen Zhou

TL;DR

The paper investigates how memory organization in Neural Turing Machines affects convergence and overfitting. It proposes three structured-memory architectures (NTM1-NTM3) with hidden memory and hierarchical writing to stabilize memories; experiments on copy and associative recall tasks show NTM1/NTM2 improve convergence speed and reduce outliers relative to baseline NTM, while NTM3 is less stable. Overall, memory structuring can stabilize NTMs and improve learning of long-range sequence tasks. This work demonstrates a viable path to enhance NTMs by rethinking memory layout rather than merely increasing memory capacity.

Abstract

Neural Turing Machines (NTM) contain memory component that simulates "working memory" in the brain to store and retrieve information to ease simple algorithms learning. So far, only linearly organized memory is proposed, and during experiments, we observed that the model does not always converge, and overfits easily when handling certain tasks. We think memory component is key to some faulty behaviors of NTM, and better organization of memory component could help fight those problems. In this paper, we propose several different structures of memory for NTM, and we proved in experiments that two of our proposed structured-memory NTMs could lead to better convergence, in term of speed and prediction accuracy on copy task and associative recall task as in (Graves et al. 2014).

Structured Memory for Neural Turing Machines

TL;DR

The paper investigates how memory organization in Neural Turing Machines affects convergence and overfitting. It proposes three structured-memory architectures (NTM1-NTM3) with hidden memory and hierarchical writing to stabilize memories; experiments on copy and associative recall tasks show NTM1/NTM2 improve convergence speed and reduce outliers relative to baseline NTM, while NTM3 is less stable. Overall, memory structuring can stabilize NTMs and improve learning of long-range sequence tasks. This work demonstrates a viable path to enhance NTMs by rethinking memory layout rather than merely increasing memory capacity.

Abstract

Neural Turing Machines (NTM) contain memory component that simulates "working memory" in the brain to store and retrieve information to ease simple algorithms learning. So far, only linearly organized memory is proposed, and during experiments, we observed that the model does not always converge, and overfits easily when handling certain tasks. We think memory component is key to some faulty behaviors of NTM, and better organization of memory component could help fight those problems. In this paper, we propose several different structures of memory for NTM, and we proved in experiments that two of our proposed structured-memory NTMs could lead to better convergence, in term of speed and prediction accuracy on copy task and associative recall task as in (Graves et al. 2014).

Paper Structure

This paper contains 5 sections, 3 equations, 2 figures.

Figures (2)

  • Figure 1: NTM and NTM variants that use Long short-term memory [5] as controllers. Note that every module in those modules are updated recurrently through time using their previous states. NTM1 contains only one write head, and one of the memory $\mathbf{M}_h$ is not written directly by controller, which is different from $\mathbf{M}_2$ in NTM2 that is written both by $\mathbf{M}_1$ and write head simultaneously. NTM3 is different from NTM2 in that NTM3 write heads takes inputs from two layers. NTM3 could also be expanded to multiple layers as well.
  • Figure 2: comparison of convergence speed and quality on copy task and associative recall task. Four figures to the left are all from copy task, and four on the right are for associative recall task. For all the figures, X-axis is the number of iterations (scaled by sampling once for every 25 iterations), and for copy task, Y-axis shows the binary cross entropy loss per item (sequence of length 8), and in recall task, Y-axis shows log of binary cross entropy loss. Memory is 128 vectors of length 20, for every memory block shown in Figure \ref{['fig']}.