Recurrent Memory Networks for Language Modeling
Ke Tran, Arianna Bisazza, Christof Monz
TL;DR
The paper tackles the interpretability gap in RNN-based language modeling by introducing the Recurrent Memory Network (RMN), which augments an LSTM with a Memory Block that attends to recent history. The MB uses attention and a gating mechanism to couple memory with the LSTM state, enabling both stronger predictive performance and inspection of what the model retains over time. Empirically, RMN yields lower perplexities than strong baselines across English, German, and Italian and achieves a new state-of-the-art on the Sentence Completion Challenge, while the attention analyses reveal learned long-distance dependencies and syntactic patterns without explicit supervision. This work advances language modeling by coupling improved accuracy with interpretability via memory-attention dynamics, with potential applicability to other NLP tasks through a simple architectural modification.
Abstract
Recurrent Neural Networks (RNN) have obtained excellent result in many natural language processing (NLP) tasks. However, understanding and interpreting the source of this success remains a challenge. In this paper, we propose Recurrent Memory Network (RMN), a novel RNN architecture, that not only amplifies the power of RNN but also facilitates our understanding of its internal functioning and allows us to discover underlying patterns in data. We demonstrate the power of RMN on language modeling and sentence completion tasks. On language modeling, RMN outperforms Long Short-Term Memory (LSTM) network on three large German, Italian, and English dataset. Additionally we perform in-depth analysis of various linguistic dimensions that RMN captures. On Sentence Completion Challenge, for which it is essential to capture sentence coherence, our RMN obtains 69.2% accuracy, surpassing the previous state-of-the-art by a large margin.
