Disentanglement with Factor Quantized Variational Autoencoders
Gulcin Baykal, Melih Kandemir, Gozde Unal
TL;DR
The paper addresses unsupervised disentangled representation learning by proposing FactorQVAE, a discrete VAE that uses scalar quantization over a single global codebook and a total correlation regularizer to encourage independence among latent factors. It optimizes a differentiable, stochastic posterior via Gumbel-Softmax and incorporates a TC-based constraint into the ELBO, enabling stable training and improved disentanglement without ground-truth factor labels. Through extensive experiments on Shapes3D, Isaac3D, and MPI3D, FactorQVAE achieves superior DCI and InfoMEC scores while maintaining competitive reconstruction quality, with ablations demonstrating the benefits of scalar quantization and a global codebook over vector quantization and per-dimension codebooks. The work highlights the practical impact of combining discrete latent representations with factor-aware regularization and provides code for replication and further exploration.
Abstract
Disentangled representation learning aims to represent the underlying generative factors of a dataset in a latent representation independently of one another. In our work, we propose a discrete variational autoencoder (VAE) based model where the ground truth information about the generative factors are not provided to the model. We demonstrate the advantages of learning discrete representations over learning continuous representations in facilitating disentanglement. Furthermore, we propose incorporating an inductive bias into the model to further enhance disentanglement. Precisely, we propose scalar quantization of the latent variables in a latent representation with scalar values from a global codebook, and we add a total correlation term to the optimization as an inductive bias. Our method called FactorQVAE combines optimization based disentanglement approaches with discrete representation learning, and it outperforms the former disentanglement methods in terms of two disentanglement metrics (DCI and InfoMEC) while improving the reconstruction performance. Our code can be found at https://github.com/ituvisionlab/FactorQVAE.
