Table of Contents
Fetching ...

Better Mixing via Deep Representations

Yoshua Bengio, Grégoire Mesnil, Yann Dauphin, Salah Rifai

TL;DR

The paper addresses the challenge of slow mixing in generative models by exploring whether deeper representations that better disentangle factors of variation can speed up Markov chain mixing. It evaluates DBNs (RBMs) and CAEs across MNIST and TFD, showing that sampling in higher-level representation spaces yields faster mixing, higher-quality samples, and more effective interpolation between data points. The authors link these findings to hypotheses about disentangling leading to manifold unfolding and expansion of high-density regions, while also demonstrating that discriminative performance does not degrade and can even improve with depth. The work suggests practical benefits for MCMC-based learning and generation, and points to potential synergy with tempering-based methods for further speedups.

Abstract

It has previously been hypothesized, and supported with some experimental evidence, that deeper representations, when well trained, tend to do a better job at disentangling the underlying factors of variation. We study the following related conjecture: better representations, in the sense of better disentangling, can be exploited to produce faster-mixing Markov chains. Consequently, mixing would be more efficient at higher levels of representation. To better understand why and how this is happening, we propose a secondary conjecture: the higher-level samples fill more uniformly the space they occupy and the high-density manifolds tend to unfold when represented at higher levels. The paper discusses these hypotheses and tests them experimentally through visualization and measurements of mixing and interpolating between samples.

Better Mixing via Deep Representations

TL;DR

The paper addresses the challenge of slow mixing in generative models by exploring whether deeper representations that better disentangle factors of variation can speed up Markov chain mixing. It evaluates DBNs (RBMs) and CAEs across MNIST and TFD, showing that sampling in higher-level representation spaces yields faster mixing, higher-quality samples, and more effective interpolation between data points. The authors link these findings to hypotheses about disentangling leading to manifold unfolding and expansion of high-density regions, while also demonstrating that discriminative performance does not degrade and can even improve with depth. The work suggests practical benefits for MCMC-based learning and generation, and points to potential synergy with tempering-based methods for further speedups.

Abstract

It has previously been hypothesized, and supported with some experimental evidence, that deeper representations, when well trained, tend to do a better job at disentangling the underlying factors of variation. We study the following related conjecture: better representations, in the sense of better disentangling, can be exploited to produce faster-mixing Markov chains. Consequently, mixing would be more efficient at higher levels of representation. To better understand why and how this is happening, we propose a secondary conjecture: the higher-level samples fill more uniformly the space they occupy and the high-density manifolds tend to unfold when represented at higher levels. The paper discusses these hypotheses and tests them experimentally through visualization and measurements of mixing and interpolating between samples.

Paper Structure

This paper contains 11 sections, 3 figures, 1 table.

Figures (3)

  • Figure 1: Sequences of $25$ samples generated with a CAE on TFD (rows 1 and 2) and with an RBM on MNIST (rows 3 and 4). On TFD, the second layer clearly allows to get quickly from woman samples (left) to man samples (right) passing by various facial expressions whereas the first layer shows poor samples. Bottom rows: On MNIST, the first layer gets stuck in the same sample while the second layer allows to mix among classes.
  • Figure 2: Linear interpolation between a data sample and the 200-th (a) and 1st (a) nearest neighbor, at various depths (top row=input space, middle row=1st layer, bottom row=2nd layer). In each $3\times3$ block the left and right columns are test examples while the middle column is the interpolated point's input image. Interpolating at higher levels clearly gives more likely samples. Especially in the raw input space (a, 2nd block), one can see two mouths overlapping while only one mouth appears for the interpolated point at the 2nd layer. Interpolating with the 1-nearest neighbor does not show any difference between the levels. In (c), we interpolate between samples of different classes, at different depths (top=raw input, middle=1st layer, bottom=2nd layer). Note how in lower levels one has to go through unplausible patterns, whereas in the deeper layers one almost jumps from a high-density region (of one class) to another.
  • Figure 3: (a) (b) Local Convex Hull - Log-likelihoods computed w.r.t. linearly interpolated samples between an example and its k-NNs, for k between 1 and 500. The manifold is unfolded in deeper levels. (d) (e) Local Convex Ball - Log-likelihoods of samples generated by adding Gaussian noise to the representation at different levels ($\sigma\in[0.01,5]$). More volume is taken by good samples on deeper layers. (c) (f) Mixing Histograms - number of classes visited (x-axis) over 10 samples (c), 20 samples (f) with CAE, 100 samples (f) with DBN.