Table of Contents
Fetching ...

Hierarchical Memory Networks

Sarath Chandar, Sungjin Ahn, Hugo Larochelle, Pascal Vincent, Gerald Tesauro, Yoshua Bengio

TL;DR

This paper tackles the scalability problem of memory networks by introducing Hierarchical Memory Networks (HMNs) that organize memory hierarchically and use Maximum Inner Product Search (MIPS) for efficient top-K retrieval. It formulates K-MIPS attention, enabling a softmax over a small subset of memory rather than the full memory, and demonstrates that exact K-MIPS can match or improve upon softmax performance while approximate K-MIPS provides substantial training-time speedups. The approach is validated on SimpleQuestions, showing improved accuracy with exact K-MIPS and confirming the trade-off between speed and accuracy for approximate MIPS methods like cluMIPS. Overall, HMNs offer a scalable, end-to-end trainable framework for large-memory question answering and related tasks, with future work aimed at dynamic memory updates during training.

Abstract

Memory networks are neural networks with an explicit memory component that can be both read and written to by the network. The memory is often addressed in a soft way using a softmax function, making end-to-end training with backpropagation possible. However, this is not computationally scalable for applications which require the network to read from extremely large memories. On the other hand, it is well known that hard attention mechanisms based on reinforcement learning are challenging to train successfully. In this paper, we explore a form of hierarchical memory network, which can be considered as a hybrid between hard and soft attention memory networks. The memory is organized in a hierarchical structure such that reading from it is done with less computation than soft attention over a flat memory, while also being easier to train than hard attention over a flat memory. Specifically, we propose to incorporate Maximum Inner Product Search (MIPS) in the training and inference procedures for our hierarchical memory network. We explore the use of various state-of-the art approximate MIPS techniques and report results on SimpleQuestions, a challenging large scale factoid question answering task.

Hierarchical Memory Networks

TL;DR

This paper tackles the scalability problem of memory networks by introducing Hierarchical Memory Networks (HMNs) that organize memory hierarchically and use Maximum Inner Product Search (MIPS) for efficient top-K retrieval. It formulates K-MIPS attention, enabling a softmax over a small subset of memory rather than the full memory, and demonstrates that exact K-MIPS can match or improve upon softmax performance while approximate K-MIPS provides substantial training-time speedups. The approach is validated on SimpleQuestions, showing improved accuracy with exact K-MIPS and confirming the trade-off between speed and accuracy for approximate MIPS methods like cluMIPS. Overall, HMNs offer a scalable, end-to-end trainable framework for large-memory question answering and related tasks, with future work aimed at dynamic memory updates during training.

Abstract

Memory networks are neural networks with an explicit memory component that can be both read and written to by the network. The memory is often addressed in a soft way using a softmax function, making end-to-end training with backpropagation possible. However, this is not computationally scalable for applications which require the network to read from extremely large memories. On the other hand, it is well known that hard attention mechanisms based on reinforcement learning are challenging to train successfully. In this paper, we explore a form of hierarchical memory network, which can be considered as a hybrid between hard and soft attention memory networks. The memory is organized in a hierarchical structure such that reading from it is done with less computation than soft attention over a flat memory, while also being easier to train than hard attention over a flat memory. Specifically, we propose to incorporate Maximum Inner Product Search (MIPS) in the training and inference procedures for our hierarchical memory network. We explore the use of various state-of-the art approximate MIPS techniques and report results on SimpleQuestions, a challenging large scale factoid question answering task.

Paper Structure

This paper contains 12 sections, 9 equations, 2 figures, 1 table.

Figures (2)

  • Figure 1: Accuracy in SQ test-set and average size of memory used. 10-softmax has high performance while using only smaller amount of memory.
  • Figure 2: Validation curve for various models. Convergence is not slowed down by k-softmax.