Table of Contents
Fetching ...

The Kanerva Machine: A Generative Distributed Memory

Yan Wu, Greg Wayne, Alex Graves, Timothy Lillicrap

TL;DR

The Kanerva Machine proposes a memory-augmented generative framework that combines a fast, distributed memory with a deep perceptual model. By treating memory updates as exact Bayesian inference and memory reads as a data-driven prior, the model achieves significantly better conditional generation than a VAE on Omniglot and CIFAR while remaining easier to train than Differentiable Neural Computers. The approach yields interpretable memory usage, supports iterative sampling to refine outputs, and enables powerful one-shot generation and denoising capabilities. This work demonstrates that principled memory updates and memory-conditioned priors can substantially improve generative modeling and provide scalable, online adaptable memory for complex data. The proposed framework opens avenues for integrating classical probabilistic memory with neural networks for robust, adaptable AI systems.

Abstract

We present an end-to-end trained memory system that quickly adapts to new data and generates samples like them. Inspired by Kanerva's sparse distributed memory, it has a robust distributed reading and writing mechanism. The memory is analytically tractable, which enables optimal on-line compression via a Bayesian update-rule. We formulate it as a hierarchical conditional generative model, where memory provides a rich data-dependent prior distribution. Consequently, the top-down memory and bottom-up perception are combined to produce the code representing an observation. Empirically, we demonstrate that the adaptive memory significantly improves generative models trained on both the Omniglot and CIFAR datasets. Compared with the Differentiable Neural Computer (DNC) and its variants, our memory model has greater capacity and is significantly easier to train.

The Kanerva Machine: A Generative Distributed Memory

TL;DR

The Kanerva Machine proposes a memory-augmented generative framework that combines a fast, distributed memory with a deep perceptual model. By treating memory updates as exact Bayesian inference and memory reads as a data-driven prior, the model achieves significantly better conditional generation than a VAE on Omniglot and CIFAR while remaining easier to train than Differentiable Neural Computers. The approach yields interpretable memory usage, supports iterative sampling to refine outputs, and enables powerful one-shot generation and denoising capabilities. This work demonstrates that principled memory updates and memory-conditioned priors can substantially improve generative modeling and provide scalable, online adaptable memory for complex data. The proposed framework opens avenues for integrating classical probabilistic memory with neural networks for robust, adaptable AI systems.

Abstract

We present an end-to-end trained memory system that quickly adapts to new data and generates samples like them. Inspired by Kanerva's sparse distributed memory, it has a robust distributed reading and writing mechanism. The memory is analytically tractable, which enables optimal on-line compression via a Bayesian update-rule. We formulate it as a hierarchical conditional generative model, where memory provides a rich data-dependent prior distribution. Consequently, the top-down memory and bottom-up perception are combined to produce the code representing an observation. Empirically, we demonstrate that the adaptive memory significantly improves generative models trained on both the Omniglot and CIFAR datasets. Compared with the Differentiable Neural Computer (DNC) and its variants, our memory model has greater capacity and is significantly easier to train.

Paper Structure

This paper contains 21 sections, 17 equations, 12 figures, 2 algorithms.

Figures (12)

  • Figure 1: The probabilistic graphical model for the Kanerva Machine. Left: the generative model; Central: reading inference model. Right: writing inference model; Dotted lines show approximate inference and dashed lines represent exact inference.
  • Figure 2: The negative variational lower bound (left), reconstruction loss (central), and KL-Divergence (right) during learning. The dip in the KL-divergence suggests that our model has learned to use the memory.
  • Figure 3: Left: reconstruction of inputs and the weights used in reconstruction, where each bin represents the weight over one memory slot. Weights are widely distributed across memory slots. Right: denoising through iterative reading. In each panel: the first column shows the original pattern, the second column (in boxes) shows the corrupted pattern, and the following columns show the reconstruction after 1, 2 and 3 iterations.
  • Figure 4: One-shot generation given a batch of examples. The first panel shows reference samples from the matched VAE. Samples from our model conditioned on 12 random examples from the specified number of classes. Conditioning examples are shown above the samples. The 5 columns show samples after 0, 2, 4, 6, and 8 iterations.
  • Figure 5: Comparison of samples from CIFAR. The 24 conditioning images (top-right) are randomly sampled from the entire CIFAR dataset, so they contains a mix of many classes. Samples from the matched VAE are blurred and lack meaningful local structure. On the other hand, samples from the Kanerva Machine have clear local structures, despite using the same encoder and decoder as the VAE. The 5 columns show samples after 0, 2, 4, 6, and 8 iterations.
  • ...and 7 more figures