Table of Contents
Fetching ...

Disentangling by Factorising

Hyunjik Kim, Andriy Mnih

TL;DR

The paper addresses unsupervised disentanglement by introducing FactorVAE, which extends the VAE objective with a Total Correlation penalty to enforce a factorial latent distribution via a discriminator-based density-ratio estimation. This approach yields better disentanglement than $\beta$-VAE for comparable reconstruction quality, while offering a more robust disentanglement metric that avoids the principal failure modes of prior metrics. Through experiments on synthetic and real datasets, FactorVAE demonstrates stronger latent-factor separation and stable training relative to InfoGAN variants. The work discusses limitations of TC-based disentanglement and proposes future directions toward handling discrete factors and mixed latent types, with implications for more controllable and transfer-ready generative models.

Abstract

We define and address the problem of unsupervised learning of disentangled representations on data generated from independent factors of variation. We propose FactorVAE, a method that disentangles by encouraging the distribution of representations to be factorial and hence independent across the dimensions. We show that it improves upon $β$-VAE by providing a better trade-off between disentanglement and reconstruction quality. Moreover, we highlight the problems of a commonly used disentanglement metric and introduce a new metric that does not suffer from them.

Disentangling by Factorising

TL;DR

The paper addresses unsupervised disentanglement by introducing FactorVAE, which extends the VAE objective with a Total Correlation penalty to enforce a factorial latent distribution via a discriminator-based density-ratio estimation. This approach yields better disentanglement than -VAE for comparable reconstruction quality, while offering a more robust disentanglement metric that avoids the principal failure modes of prior metrics. Through experiments on synthetic and real datasets, FactorVAE demonstrates stronger latent-factor separation and stable training relative to InfoGAN variants. The work discusses limitations of TC-based disentanglement and proposes future directions toward handling discrete factors and mixed latent types, with implications for more controllable and transfer-ready generative models.

Abstract

We define and address the problem of unsupervised learning of disentangled representations on data generated from independent factors of variation. We propose FactorVAE, a method that disentangles by encouraging the distribution of representations to be factorial and hence independent across the dimensions. We show that it improves upon -VAE by providing a better trade-off between disentanglement and reconstruction quality. Moreover, we highlight the problems of a commonly used disentanglement metric and introduce a new metric that does not suffer from them.

Paper Structure

This paper contains 16 sections, 1 theorem, 20 equations, 40 figures, 2 tables, 2 algorithms.

Key Result

Lemma 1

$\mathbb{E}_{p_{data}(x)}[KL(q(z|x)||p(z))] = I_q(x;z) + KL(q(z)||p(z))$ where $q(x,z) = p_{data}(x)q(z|x)$.

Figures (40)

  • Figure 1: Architecture of FactorVAE, a Variational Autoencoder (VAE) that encourages the code distribution to be factorial. The top row is a VAE with convolutional encoder and decoder, and the bottom row is an MLP classifier, the discriminator, that distinguishes whether the input was drawn from the marginal code distribution or the product of its marginals.
  • Figure 2: Top: Metric in higgins2016beta. Bottom: Our new metric, where $s \in \mathbb{R}^d$ is the scale (empirical standard deviation) of latent representations of the full data (or large enough random subset).
  • Figure 3: A $\beta$-VAE model trained on the 2D Shapes data that scores 100% on metric in higgins2016beta (ignoring the shape factor). First row: originals. Second row: reconstructions. Remaining rows: reconstructions of latent traversals. The model only uses three latent units to capture $x$-position, $y$-position, scale and ignores orientation, yet achieves a perfect score on the metric.
  • Figure 4: Reconstruction error (top), metric in higgins2016beta (middle), our metric (bottom). $\beta$-VAE (left), FactorVAE (right). The colours correspond to different values of $\beta$ and $\gamma$ respectively, and confidence intervals are over 10 random seeds.
  • Figure 5: Reconstruction error plotted against our disentanglement metric, both averaged over 10 random seeds at the end of training. The numbers at each point are values of $\beta$ and $\gamma$. Note that we want low reconstruction error and a high disentanglement metric.
  • ...and 35 more figures

Theorems & Definitions (4)

  • Lemma 1
  • proof
  • Remark
  • proof