Table of Contents
Fetching ...

Learning to Remember Rare Events

Łukasz Kaiser, Ofir Nachum, Aurko Roy, Samy Bengio

TL;DR

The paper tackles lifelong and one-shot learning by introducing a scalable, differentiable memory module that stores key-value pairs and supports fast k-nearest-neighbor queries. It integrates with CNNs, sequence-to-sequence models (including GNMT), and the Extended Neural GPU, guided by a margin-based memory loss and an age-based update policy to retain useful past examples. Experiments demonstrate state-of-the-art one-shot results on Omniglot, strong memory-driven performance on a synthetic task, and meaningful translation gains from memory context, including rare word handling. The work highlights practical benefits for memory-enabled AI with potential for explainability, while acknowledging the need for better one-shot evaluation metrics and further refinements to memory dynamics.

Abstract

Despite recent advances, memory-augmented deep neural networks are still limited when it comes to life-long and one-shot learning, especially in remembering rare events. We present a large-scale life-long memory module for use in deep learning. The module exploits fast nearest-neighbor algorithms for efficiency and thus scales to large memory sizes. Except for the nearest-neighbor query, the module is fully differentiable and trained end-to-end with no extra supervision. It operates in a life-long manner, i.e., without the need to reset it during training. Our memory module can be easily added to any part of a supervised neural network. To show its versatility we add it to a number of networks, from simple convolutional ones tested on image classification to deep sequence-to-sequence and recurrent-convolutional models. In all cases, the enhanced network gains the ability to remember and do life-long one-shot learning. Our module remembers training examples shown many thousands of steps in the past and it can successfully generalize from them. We set new state-of-the-art for one-shot learning on the Omniglot dataset and demonstrate, for the first time, life-long one-shot learning in recurrent neural networks on a large-scale machine translation task.

Learning to Remember Rare Events

TL;DR

The paper tackles lifelong and one-shot learning by introducing a scalable, differentiable memory module that stores key-value pairs and supports fast k-nearest-neighbor queries. It integrates with CNNs, sequence-to-sequence models (including GNMT), and the Extended Neural GPU, guided by a margin-based memory loss and an age-based update policy to retain useful past examples. Experiments demonstrate state-of-the-art one-shot results on Omniglot, strong memory-driven performance on a synthetic task, and meaningful translation gains from memory context, including rare word handling. The work highlights practical benefits for memory-enabled AI with potential for explainability, while acknowledging the need for better one-shot evaluation metrics and further refinements to memory dynamics.

Abstract

Despite recent advances, memory-augmented deep neural networks are still limited when it comes to life-long and one-shot learning, especially in remembering rare events. We present a large-scale life-long memory module for use in deep learning. The module exploits fast nearest-neighbor algorithms for efficiency and thus scales to large memory sizes. Except for the nearest-neighbor query, the module is fully differentiable and trained end-to-end with no extra supervision. It operates in a life-long manner, i.e., without the need to reset it during training. Our memory module can be easily added to any part of a supervised neural network. To show its versatility we add it to a number of networks, from simple convolutional ones tested on image classification to deep sequence-to-sequence and recurrent-convolutional models. In all cases, the enhanced network gains the ability to remember and do life-long one-shot learning. Our module remembers training examples shown many thousands of steps in the past and it can successfully generalize from them. We set new state-of-the-art for one-shot learning on the Omniglot dataset and demonstrate, for the first time, life-long one-shot learning in recurrent neural networks on a large-scale machine translation task.

Paper Structure

This paper contains 17 sections, 6 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: The operation of the memory module on a query $q$ with correct value $v$; see text for details.
  • Figure 2: The GNMT model with added memory module. On each decoding step $t$, the result of the attention $a_t$ is used to query the memory. The resulting value is combined with the output of the final LSTM layer to produce the predicted logits $\hat{y}_t$. See text for further details.
  • Figure 3: Extended Neural GPU with memory module. Memory query is read from the position one below the current output logit, and the embedded memory value is put at the same position of the output tape $p$. The network learns to use these values to produce the output in the next step.