Variational Inference of Disentangled Latent Concepts from Unlabeled Observations
Abhishek Kumar, Prasanna Sattigeri, Avinash Balakrishnan
TL;DR
The paper investigates unsupervised learning of disentangled latent factors from unlabeled data and proposes DIP-VAE, a variational framework that adds a covariance-based regularizer on the inferred prior to encourage factorization of latents without sacrificing data likelihood. It introduces two variants (DIP-VAE-I and DIP-VAE-II) and the SAP score for evaluating disentanglement, arguing that SAP aligns better with decoder-level disentanglement than prior metrics. Empirical results on CelebA, 2D Shapes, and 3D Chairs show improved disentanglement with strong reconstructions, with DIP-VAE-II offering the best trade-off. The work provides a scalable, principled method for learning interpretable latent factors and introduces a practical disentanglement metric with broad applicability.
Abstract
Disentangled representations, where the higher level data generative factors are reflected in disjoint latent dimensions, offer several benefits such as ease of deriving invariant representations, transferability to other tasks, interpretability, etc. We consider the problem of unsupervised learning of disentangled representations from large pool of unlabeled observations, and propose a variational inference based approach to infer disentangled latent factors. We introduce a regularizer on the expectation of the approximate posterior over observed data that encourages the disentanglement. We also propose a new disentanglement metric which is better aligned with the qualitative disentanglement observed in the decoder's output. We empirically observe significant improvement over existing methods in terms of both disentanglement and data likelihood (reconstruction quality).
