Table of Contents
Fetching ...

Variational Inference of Disentangled Latent Concepts from Unlabeled Observations

Abhishek Kumar, Prasanna Sattigeri, Avinash Balakrishnan

TL;DR

The paper investigates unsupervised learning of disentangled latent factors from unlabeled data and proposes DIP-VAE, a variational framework that adds a covariance-based regularizer on the inferred prior to encourage factorization of latents without sacrificing data likelihood. It introduces two variants (DIP-VAE-I and DIP-VAE-II) and the SAP score for evaluating disentanglement, arguing that SAP aligns better with decoder-level disentanglement than prior metrics. Empirical results on CelebA, 2D Shapes, and 3D Chairs show improved disentanglement with strong reconstructions, with DIP-VAE-II offering the best trade-off. The work provides a scalable, principled method for learning interpretable latent factors and introduces a practical disentanglement metric with broad applicability.

Abstract

Disentangled representations, where the higher level data generative factors are reflected in disjoint latent dimensions, offer several benefits such as ease of deriving invariant representations, transferability to other tasks, interpretability, etc. We consider the problem of unsupervised learning of disentangled representations from large pool of unlabeled observations, and propose a variational inference based approach to infer disentangled latent factors. We introduce a regularizer on the expectation of the approximate posterior over observed data that encourages the disentanglement. We also propose a new disentanglement metric which is better aligned with the qualitative disentanglement observed in the decoder's output. We empirically observe significant improvement over existing methods in terms of both disentanglement and data likelihood (reconstruction quality).

Variational Inference of Disentangled Latent Concepts from Unlabeled Observations

TL;DR

The paper investigates unsupervised learning of disentangled latent factors from unlabeled data and proposes DIP-VAE, a variational framework that adds a covariance-based regularizer on the inferred prior to encourage factorization of latents without sacrificing data likelihood. It introduces two variants (DIP-VAE-I and DIP-VAE-II) and the SAP score for evaluating disentanglement, arguing that SAP aligns better with decoder-level disentanglement than prior metrics. Empirical results on CelebA, 2D Shapes, and 3D Chairs show improved disentanglement with strong reconstructions, with DIP-VAE-II offering the best trade-off. The work provides a scalable, principled method for learning interpretable latent factors and introduces a practical disentanglement metric with broad applicability.

Abstract

Disentangled representations, where the higher level data generative factors are reflected in disjoint latent dimensions, offer several benefits such as ease of deriving invariant representations, transferability to other tasks, interpretability, etc. We consider the problem of unsupervised learning of disentangled representations from large pool of unlabeled observations, and propose a variational inference based approach to infer disentangled latent factors. We introduce a regularizer on the expectation of the approximate posterior over observed data that encourages the disentanglement. We also propose a new disentanglement metric which is better aligned with the qualitative disentanglement observed in the decoder's output. We empirically observe significant improvement over existing methods in terms of both disentanglement and data likelihood (reconstruction quality).

Paper Structure

This paper contains 10 sections, 9 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Proposed Separated Atomic Predictability (SAP) score and the Z-diff disentanglement score higgins2016beta as a function of average reconstruction error (per pixel) on the test set of 2D Shapes data for $\beta$-VAE and the proposed DIP-VAE. The plots are generated by varying $\beta$ for $\beta$-VAE, and $\lambda_{od}$ for DIP-VAE-I and DIP-VAE-II (the number next to each point is the value of these hyperparameters, respectively).
  • Figure 2: The proposed SAP score and the Z-diff score higgins2016beta as a function of average reconstruction error (per pixel) on the test set of CelebA data for $\beta$-VAE and the proposed DIP-VAE. The plots are generated by varying $\beta$ for $\beta$-VAE, and $\lambda_{od}$ for DIP-VAE-I and DIP-VAE-II (the number next to each point is the value of these hyperparameters, respectively).
  • Figure 3: Qualitative results for disentanglement in 2D Shapes dataset dsprites17. SAP scores, Z-diff scores and reconstruction errors for the methods (rows) can be read from Fig. \ref{['fig:recon-zdiff-dsprites']}.
  • Figure 4: Qualitative results for disentanglement in 2D Shapes dataset dsprites17 for DIP-VAE-I (SAP score $0.1889$).
  • Figure 5: Qualitative results for disentanglement in CelebA dataset.
  • ...and 1 more figures