Uniform Transformation: Refining Latent Representation in Variational Autoencoders
Ye Shi, C. S. George Lee
TL;DR
The paper tackles irregular latent posteriors in VAEs that hinder disentanglement and reliable sampling by introducing a three-stage Uniform Transformation (UT) module. The UT pipeline combines Gaussian Kernel Density Estimation (G-KDE) clustering to identify child Gaussians, Gaussian Mixture (GM) modeling to represent the posterior, and Probability Integral Transform (PIT) to map the GM to a uniform latent distribution, thereby reducing posterior collapse and improving interpretability. Empirical results on dSprites and MNIST show enhanced disentanglement metrics for several baseline VAEs, with the UT approach yielding sharper reconstructions and better capture of periodic features, while remaining computationally feasible. This framework offers a modular path to more structured latent spaces, potentially extending to richer datasets and downstream tasks where disentanglement and robust sampling are critical.
Abstract
Irregular distribution in latent space causes posterior collapse, misalignment between posterior and prior, and ill-sampling problem in Variational Autoencoders (VAEs). In this paper, we introduce a novel adaptable three-stage Uniform Transformation (UT) module -- Gaussian Kernel Density Estimation (G-KDE) clustering, non-parametric Gaussian Mixture (GM) Modeling, and Probability Integral Transform (PIT) -- to address irregular latent distributions. By reconfiguring irregular distributions into a uniform distribution in the latent space, our approach significantly enhances the disentanglement and interpretability of latent representations, overcoming the limitation of traditional VAE models in capturing complex data structures. Empirical evaluations demonstrated the efficacy of our proposed UT module in improving disentanglement metrics across benchmark datasets -- dSprites and MNIST. Our findings suggest a promising direction for advancing representation learning techniques, with implication for future research in extending this framework to more sophisticated datasets and downstream tasks.
