Table of Contents
Fetching ...

Recurrent Memory Array Structures

Kamil Rocki

TL;DR

This work investigates augmenting LSTM with multi-memory-cell per hidden unit structures (Array-LSTM) to improve sequence modeling and generalization. It contrasts deterministic extensions (lane selection via soft attention and max-pooling) with non-deterministic, stochastic variants (stochastic output pooling and stochastic memory arrays) to mitigate overfitting. Empirical results show the Stochastic Memory Array achieves a new neural baseline on enwik8 (1.402 BPC) and competitive performance on enwik9 and enwik10, highlighting the regularizing effect of stochastic memory operations. Overall, the findings suggest that memory-augmented architectures can reach state-of-the-art compression-like predictive performance when paired with appropriate stochastic regularization and sufficient data.

Abstract

The following report introduces ideas augmenting standard Long Short Term Memory (LSTM) architecture with multiple memory cells per hidden unit in order to improve its generalization capabilities. It considers both deterministic and stochastic variants of memory operation. It is shown that the nondeterministic Array-LSTM approach improves state-of-the-art performance on character level text prediction achieving 1.402 BPC on enwik8 dataset. Furthermore, this report estabilishes baseline neural-based results of 1.12 BPC and 1.19 BPC for enwik9 and enwik10 datasets respectively.

Recurrent Memory Array Structures

TL;DR

This work investigates augmenting LSTM with multi-memory-cell per hidden unit structures (Array-LSTM) to improve sequence modeling and generalization. It contrasts deterministic extensions (lane selection via soft attention and max-pooling) with non-deterministic, stochastic variants (stochastic output pooling and stochastic memory arrays) to mitigate overfitting. Empirical results show the Stochastic Memory Array achieves a new neural baseline on enwik8 (1.402 BPC) and competitive performance on enwik9 and enwik10, highlighting the regularizing effect of stochastic memory operations. Overall, the findings suggest that memory-augmented architectures can reach state-of-the-art compression-like predictive performance when paired with appropriate stochastic regularization and sufficient data.

Abstract

The following report introduces ideas augmenting standard Long Short Term Memory (LSTM) architecture with multiple memory cells per hidden unit in order to improve its generalization capabilities. It considers both deterministic and stochastic variants of memory operation. It is shown that the nondeterministic Array-LSTM approach improves state-of-the-art performance on character level text prediction achieving 1.402 BPC on enwik8 dataset. Furthermore, this report estabilishes baseline neural-based results of 1.12 BPC and 1.19 BPC for enwik9 and enwik10 datasets respectively.

Paper Structure

This paper contains 32 sections, 23 equations, 10 figures, 1 table.

Figures (10)

  • Figure 2.1: Simple RNN unit (omitted implementation specifics); $h^t$ - internal (hidden) state at time step $t$; $x$ are inputs, $y$ are optional outputs to be emitted; All connections are learnable parameters. No explicit asynchronous memory, implicit history aggregation only through hidden states $h$. Omitted bias terms for brevity.
  • Figure 3.1: LSTM, biases and nonlinearites omitted for brevity; The memory content $c^{t}$ is set according to the gates' activations which are in turn driven by the bottom-up input $x^t$ and previous internal state $h^{t-1}$.
  • Figure 3.2: The structure of the cerebellar cortex Kanerva:1988:SDM:534853
  • Figure 3.3: Array-LSTM with 4 memory cells per hidden control unit; modulated vs modulating connections, omitted nonlinearities and biases for brevity; It becomes a standard LSTM when only 1 memory cell per hidden unit is present
  • Figure 4.1: Lane selection through soft attention, solid lines are control logic signals (from/to gates), dotted lines are memory cell lanes used upon selection, dashed lines represent the carry lanes used otherwise; dotted and dashed lanes are mutally exclusive
  • ...and 5 more figures