Disentangling Factors of Variation via Generative Entangling
Guillaume Desjardins, Aaron Courville, Yoshua Bengio
TL;DR
The paper introduces a higher-order spike-and-slab Boltzmann machine (ssRBM) that uses four-way multiplicative interactions among latent groups to entangle and thereby disentangle factors of variation in data, trained fully with unsupervised maximum likelihood. Through a block-wise, multi-way pooling scheme and variational mean-field inference, the model learns to segregate factors such as emotion and identity without labels. Experiments on synthetic data and the Toronto Face Dataset show the approach can produce interpretable, disentangled representations and improve emotion-recognition performance relative to non-disentangled baselines, with competitive results compared to supervised and other unsupervised methods. The work highlights a path toward deep, layered disentangling by stacking such blocks, maintaining local coherence while progressively uncovering higher-level, nonlocal factors.
Abstract
Here we propose a novel model family with the objective of learning to disentangle the factors of variation in data. Our approach is based on the spike-and-slab restricted Boltzmann machine which we generalize to include higher-order interactions among multiple latent variables. Seen from a generative perspective, the multiplicative interactions emulates the entangling of factors of variation. Inference in the model can be seen as disentangling these generative factors. Unlike previous attempts at disentangling latent factors, the proposed model is trained using no supervised information regarding the latent factors. We apply our model to the task of facial expression classification.
