Multiple Invertible and Partial-Equivariant Function for Latent Vector Transformation to Enhance Disentanglement in VAEs
Hee-Jun Jung, Jaehyoung Jeong, Kangil Kim
TL;DR
This work tackles unsupervised disentanglement in VAEs by introducing MIPE, a two-part framework that (i) uses Multiple Invertible and Partial-Equivariant Transformations (MIPE) to transform latent vectors through an invertible, symmetric-matrix exponential that preserves partial input-to-latent equivariance, and (ii) applies Exponential-Family Conversion (EF-conversion) to map the transformed latent variables to flexible, non-Gaussian priors via learnable natural parameters. The authors develop a principled loss structure, including EF similarity loss, KL divergence in the EF setting, and a KL calibration term with an implicit semantic mask, enabling the model to learn unknown latent distributions and improve disentanglement. They also demonstrate how multiple IPE units can be integrated into VAEs to substantially boost disentanglement metrics across dSprites, 3D Shapes, and 3D Cars datasets, with ablations validating the contributions of symmetry, invertibility, and EF-conversion. The results suggest that MIPE provides a practical, plug-in inductive bias for state-of-the-art disentanglement while offering a flexible prior framework for latent representations. Overall, MIPE advances unsupervised disentanglement by combining principled group-theoretic ideas with a probabilistic, expressive prior core, enabling more interpretable and reusable latent factors with broad applicability.
Abstract
Disentanglement learning is central to understanding and reusing learned representations in variational autoencoders (VAEs). Although equivariance has been explored in this context, effectively exploiting it for disentanglement remains challenging. In this paper, we propose a novel method, called Multiple Invertible and Partial-Equivariant Transformation (MIPE-Transformation), which integrates two main parts: (1) Invertible and Partial-Equivariant Transformation (IPE-Transformation), guaranteeing an invertible latent-to-transformed-latent mapping while preserving partial input-to-latent equivariance in the transformed latent space; and (2) Exponential-Family Conversion (EF-Conversion) to extend the standard Gaussian prior to an approximate exponential family via a learnable conversion. In experiments on the 3D Cars, 3D Shapes, and dSprites datasets, MIPE-Transformation improves the disentanglement performance of state-of-the-art VAEs.
