Table of Contents
Fetching ...

Recurrent Memory Networks for Language Modeling

Ke Tran, Arianna Bisazza, Christof Monz

TL;DR

The paper tackles the interpretability gap in RNN-based language modeling by introducing the Recurrent Memory Network (RMN), which augments an LSTM with a Memory Block that attends to recent history. The MB uses attention and a gating mechanism to couple memory with the LSTM state, enabling both stronger predictive performance and inspection of what the model retains over time. Empirically, RMN yields lower perplexities than strong baselines across English, German, and Italian and achieves a new state-of-the-art on the Sentence Completion Challenge, while the attention analyses reveal learned long-distance dependencies and syntactic patterns without explicit supervision. This work advances language modeling by coupling improved accuracy with interpretability via memory-attention dynamics, with potential applicability to other NLP tasks through a simple architectural modification.

Abstract

Recurrent Neural Networks (RNN) have obtained excellent result in many natural language processing (NLP) tasks. However, understanding and interpreting the source of this success remains a challenge. In this paper, we propose Recurrent Memory Network (RMN), a novel RNN architecture, that not only amplifies the power of RNN but also facilitates our understanding of its internal functioning and allows us to discover underlying patterns in data. We demonstrate the power of RMN on language modeling and sentence completion tasks. On language modeling, RMN outperforms Long Short-Term Memory (LSTM) network on three large German, Italian, and English dataset. Additionally we perform in-depth analysis of various linguistic dimensions that RMN captures. On Sentence Completion Challenge, for which it is essential to capture sentence coherence, our RMN obtains 69.2% accuracy, surpassing the previous state-of-the-art by a large margin.

Recurrent Memory Networks for Language Modeling

TL;DR

The paper tackles the interpretability gap in RNN-based language modeling by introducing the Recurrent Memory Network (RMN), which augments an LSTM with a Memory Block that attends to recent history. The MB uses attention and a gating mechanism to couple memory with the LSTM state, enabling both stronger predictive performance and inspection of what the model retains over time. Empirically, RMN yields lower perplexities than strong baselines across English, German, and Italian and achieves a new state-of-the-art on the Sentence Completion Challenge, while the attention analyses reveal learned long-distance dependencies and syntactic patterns without explicit supervision. This work advances language modeling by coupling improved accuracy with interpretability via memory-attention dynamics, with potential applicability to other NLP tasks through a simple architectural modification.

Abstract

Recurrent Neural Networks (RNN) have obtained excellent result in many natural language processing (NLP) tasks. However, understanding and interpreting the source of this success remains a challenge. In this paper, we propose Recurrent Memory Network (RMN), a novel RNN architecture, that not only amplifies the power of RNN but also facilitates our understanding of its internal functioning and allows us to discover underlying patterns in data. We demonstrate the power of RMN on language modeling and sentence completion tasks. On language modeling, RMN outperforms Long Short-Term Memory (LSTM) network on three large German, Italian, and English dataset. Additionally we perform in-depth analysis of various linguistic dimensions that RMN captures. On Sentence Completion Challenge, for which it is essential to capture sentence coherence, our RMN obtains 69.2% accuracy, surpassing the previous state-of-the-art by a large margin.

Paper Structure

This paper contains 14 sections, 5 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: A graphical representation of the MB.
  • Figure 2: A graphical illustration of an unfolded RMR with memory size 4. Dashed line indicates concatenation. The MB takes the output of the bottom LSTM layer and the 4-word history as its input. The output of the MB is then passed to the second LSTM layer on top. There is no direct connection between MBs of different time steps. The last LSTM layer carries the MB's outputs recurrently.
  • Figure 3: Average attention per position of RMN history. Top: RMR(--tM-g), bottom: RM(+tM-g). Rightmost positions represent most recent history.
  • Figure 4: Attention visualization of 100 word samples. Bottom positions in each plot represent most recent history. Darker color means higher weight.
  • Figure 5: Examples of distant memory positions attended by RMN. The resulting top five word predictions are shown with the respective log-probabilities. The correct choice (in bold) was ranked first in sentences (a,b) and second in (c).
  • ...and 2 more figures