DualContrast: Unsupervised Disentangling of Content and Transformations with Implicit Parameterization
Mostofa Rafid Uddin, Min Xu
TL;DR
This work tackles unsupervised disentanglement of content (identity) and transformations (pose/state) in shape-focused scientific images where transformations lack explicit parameterization. It introduces DualContrast, a two-latent-variable VAE that jointly infers content codes $\mathbf{c}$ and transformation codes $\mathbf{z}$ without prespecifying transformation models, trained with an ELBO objective and novel contrastive losses on both codes. By creating positive/negative pairs for content (via data augmentations) and for transformation (via the latent-space generation and rotation-driven strategies), DualContrast achieves robust disentanglement across MNIST, LineMod, and realistic cryo-ET subtomogram datasets, including the first unsupervised separation of protein composition from conformations. The approach yields clearer latent-space clustering, improved content–transformation transfer, and enables downstream analyses such as subtomogram averaging, highlighting its practical impact for scientific imaging where explicit transformation models are unavailable.
Abstract
Unsupervised disentanglement of content and transformation is significantly important for analyzing shape-focused scientific image datasets, given their efficacy in solving downstream image-based shape-analyses tasks. The existing relevant works address the problem by explicitly parameterizing the transformation latent codes in a generative model, significantly reducing their expressiveness. Moreover, they are not applicable in cases where transformations can not be readily parametrized. An alternative to such explicit approaches is contrastive methods with data augmentation, which implicitly disentangles transformations and content. However, the existing contrastive strategies are insufficient to this end. Therefore, we developed a novel contrastive method with generative modeling, DualContrast, specifically for unsupervised disentanglement of content and transformations in shape-focused image datasets. DualContrast creates positive and negative pairs for content and transformation from data and latent spaces. Our extensive experiments showcase the efficacy of DualContrast over existing self-supervised and explicit parameterization approaches. With DualContrast, we disentangled protein composition and conformations in cellular 3D protein images, which was unattainable with existing disentanglement approaches
