Recurrent Memory Array Structures
Kamil Rocki
TL;DR
This work investigates augmenting LSTM with multi-memory-cell per hidden unit structures (Array-LSTM) to improve sequence modeling and generalization. It contrasts deterministic extensions (lane selection via soft attention and max-pooling) with non-deterministic, stochastic variants (stochastic output pooling and stochastic memory arrays) to mitigate overfitting. Empirical results show the Stochastic Memory Array achieves a new neural baseline on enwik8 (1.402 BPC) and competitive performance on enwik9 and enwik10, highlighting the regularizing effect of stochastic memory operations. Overall, the findings suggest that memory-augmented architectures can reach state-of-the-art compression-like predictive performance when paired with appropriate stochastic regularization and sufficient data.
Abstract
The following report introduces ideas augmenting standard Long Short Term Memory (LSTM) architecture with multiple memory cells per hidden unit in order to improve its generalization capabilities. It considers both deterministic and stochastic variants of memory operation. It is shown that the nondeterministic Array-LSTM approach improves state-of-the-art performance on character level text prediction achieving 1.402 BPC on enwik8 dataset. Furthermore, this report estabilishes baseline neural-based results of 1.12 BPC and 1.19 BPC for enwik9 and enwik10 datasets respectively.
