Table of Contents
Fetching ...

Learning Representations by Maximizing Compression

Karol Gregor, Yann LeCun

TL;DR

This work treats data modeling as predicting a sequence of bits with probabilities P(x_k|x_{1:k-1}), using a two-path autoencoder-like predictor that yields an exact data likelihood. The model, defined by matrices U,V,R and biases, updates a hidden representation as each new pixel arrives and outputs a Bernoulli probability for the next bit, enabling arithmetic coding-based compression via the likelihood. Empirically, learned filters resemble RBM- and denoising-autoencoder-like features on USPS and MNIST, and the model can generate independent digit samples by sweeping through the pixel sequence. Across compression benchmarks, including comparisons to DjVu, RBMs, and center-difference baselines, the full model achieves competitive to state-of-the-art performance (e.g., about 81.0 bits for USPS and 92.2 bits for MNIST with 1000 units) while allowing flexible permutation of pixel order and explicit likelihood computation.

Abstract

We give an algorithm that learns a representation of data through compression. The algorithm 1) predicts bits sequentially from those previously seen and 2) has a structure and a number of computations similar to an autoencoder. The likelihood under the model can be calculated exactly, and arithmetic coding can be used directly for compression. When training on digits the algorithm learns filters similar to those of restricted boltzman machines and denoising autoencoders. Independent samples can be drawn from the model by a single sweep through the pixels. The algorithm has a good compression performance when compared to other methods that work under random ordering of pixels.

Learning Representations by Maximizing Compression

TL;DR

This work treats data modeling as predicting a sequence of bits with probabilities P(x_k|x_{1:k-1}), using a two-path autoencoder-like predictor that yields an exact data likelihood. The model, defined by matrices U,V,R and biases, updates a hidden representation as each new pixel arrives and outputs a Bernoulli probability for the next bit, enabling arithmetic coding-based compression via the likelihood. Empirically, learned filters resemble RBM- and denoising-autoencoder-like features on USPS and MNIST, and the model can generate independent digit samples by sweeping through the pixel sequence. Across compression benchmarks, including comparisons to DjVu, RBMs, and center-difference baselines, the full model achieves competitive to state-of-the-art performance (e.g., about 81.0 bits for USPS and 92.2 bits for MNIST with 1000 units) while allowing flexible permutation of pixel order and explicit likelihood computation.

Abstract

We give an algorithm that learns a representation of data through compression. The algorithm 1) predicts bits sequentially from those previously seen and 2) has a structure and a number of computations similar to an autoencoder. The likelihood under the model can be calculated exactly, and arithmetic coding can be used directly for compression. When training on digits the algorithm learns filters similar to those of restricted boltzman machines and denoising autoencoders. Independent samples can be drawn from the model by a single sweep through the pixels. The algorithm has a good compression performance when compared to other methods that work under random ordering of pixels.

Paper Structure

This paper contains 12 sections, 2 equations, 3 figures, 1 table, 1 algorithm.

Figures (3)

  • Figure 1: A diagram of our system. The system predicts pixels of the input sequentially. A given pixel is predicted only from pixels already seen. The prediction function consists of two parts. The $R$ path is a direct matrix multiplication. The $UV$ path is a matrix multiplication followed by sigmoid and followed by a matrix mupltiplication. The two paths are added and passed through final sigmoid to generate the prediction of the next pixel.
  • Figure 2: Examples of filters learned on MNIST digits. The first column is the U matrix, the second column is the V matrix and the third column is the R matrix. In the first row the permutation of pixels went from upper left to lower bottom (reading page). In the second row the permutation was random and changed at each iteration. In the third row the permutation was the same as in the previous row, but in addition the average pixel values were not subtracted from the input ($\bar{x}=x$).
  • Figure 3: Generated digits after training on MNIST with a) the system with R path only b) the full system. For each image the pixels were generated in sequence from upper left to lower right. Given $n$ already generated pixels, the new pixel was generated by calculating its probability under the model and sampling. The a) mostly captures the local structure but b) captures full structure and often generates nice digit. Note that this way we obtain independent samples, each containing one sweep throug the pixels.