Learning Representations by Maximizing Compression
Karol Gregor, Yann LeCun
TL;DR
This work treats data modeling as predicting a sequence of bits with probabilities P(x_k|x_{1:k-1}), using a two-path autoencoder-like predictor that yields an exact data likelihood. The model, defined by matrices U,V,R and biases, updates a hidden representation as each new pixel arrives and outputs a Bernoulli probability for the next bit, enabling arithmetic coding-based compression via the likelihood. Empirically, learned filters resemble RBM- and denoising-autoencoder-like features on USPS and MNIST, and the model can generate independent digit samples by sweeping through the pixel sequence. Across compression benchmarks, including comparisons to DjVu, RBMs, and center-difference baselines, the full model achieves competitive to state-of-the-art performance (e.g., about 81.0 bits for USPS and 92.2 bits for MNIST with 1000 units) while allowing flexible permutation of pixel order and explicit likelihood computation.
Abstract
We give an algorithm that learns a representation of data through compression. The algorithm 1) predicts bits sequentially from those previously seen and 2) has a structure and a number of computations similar to an autoencoder. The likelihood under the model can be calculated exactly, and arithmetic coding can be used directly for compression. When training on digits the algorithm learns filters similar to those of restricted boltzman machines and denoising autoencoders. Independent samples can be drawn from the model by a single sweep through the pixels. The algorithm has a good compression performance when compared to other methods that work under random ordering of pixels.
