Better Mixing via Deep Representations
Yoshua Bengio, Grégoire Mesnil, Yann Dauphin, Salah Rifai
TL;DR
The paper addresses the challenge of slow mixing in generative models by exploring whether deeper representations that better disentangle factors of variation can speed up Markov chain mixing. It evaluates DBNs (RBMs) and CAEs across MNIST and TFD, showing that sampling in higher-level representation spaces yields faster mixing, higher-quality samples, and more effective interpolation between data points. The authors link these findings to hypotheses about disentangling leading to manifold unfolding and expansion of high-density regions, while also demonstrating that discriminative performance does not degrade and can even improve with depth. The work suggests practical benefits for MCMC-based learning and generation, and points to potential synergy with tempering-based methods for further speedups.
Abstract
It has previously been hypothesized, and supported with some experimental evidence, that deeper representations, when well trained, tend to do a better job at disentangling the underlying factors of variation. We study the following related conjecture: better representations, in the sense of better disentangling, can be exploited to produce faster-mixing Markov chains. Consequently, mixing would be more efficient at higher levels of representation. To better understand why and how this is happening, we propose a secondary conjecture: the higher-level samples fill more uniformly the space they occupy and the high-density manifolds tend to unfold when represented at higher levels. The paper discusses these hypotheses and tests them experimentally through visualization and measurements of mixing and interpolating between samples.
