DualContrast: Unsupervised Disentangling of Content and Transformations with Implicit Parameterization

Mostofa Rafid Uddin; Min Xu

DualContrast: Unsupervised Disentangling of Content and Transformations with Implicit Parameterization

Mostofa Rafid Uddin, Min Xu

TL;DR

This work tackles unsupervised disentanglement of content (identity) and transformations (pose/state) in shape-focused scientific images where transformations lack explicit parameterization. It introduces DualContrast, a two-latent-variable VAE that jointly infers content codes $\mathbf{c}$ and transformation codes $\mathbf{z}$ without prespecifying transformation models, trained with an ELBO objective and novel contrastive losses on both codes. By creating positive/negative pairs for content (via data augmentations) and for transformation (via the latent-space generation and rotation-driven strategies), DualContrast achieves robust disentanglement across MNIST, LineMod, and realistic cryo-ET subtomogram datasets, including the first unsupervised separation of protein composition from conformations. The approach yields clearer latent-space clustering, improved content–transformation transfer, and enables downstream analyses such as subtomogram averaging, highlighting its practical impact for scientific imaging where explicit transformation models are unavailable.

Abstract

Unsupervised disentanglement of content and transformation is significantly important for analyzing shape-focused scientific image datasets, given their efficacy in solving downstream image-based shape-analyses tasks. The existing relevant works address the problem by explicitly parameterizing the transformation latent codes in a generative model, significantly reducing their expressiveness. Moreover, they are not applicable in cases where transformations can not be readily parametrized. An alternative to such explicit approaches is contrastive methods with data augmentation, which implicitly disentangles transformations and content. However, the existing contrastive strategies are insufficient to this end. Therefore, we developed a novel contrastive method with generative modeling, DualContrast, specifically for unsupervised disentanglement of content and transformations in shape-focused image datasets. DualContrast creates positive and negative pairs for content and transformation from data and latent spaces. Our extensive experiments showcase the efficacy of DualContrast over existing self-supervised and explicit parameterization approaches. With DualContrast, we disentangled protein composition and conformations in cellular 3D protein images, which was unattainable with existing disentanglement approaches

DualContrast: Unsupervised Disentangling of Content and Transformations with Implicit Parameterization

TL;DR

and transformation codes

without prespecifying transformation models, trained with an ELBO objective and novel contrastive losses on both codes. By creating positive/negative pairs for content (via data augmentations) and for transformation (via the latent-space generation and rotation-driven strategies), DualContrast achieves robust disentanglement across MNIST, LineMod, and realistic cryo-ET subtomogram datasets, including the first unsupervised separation of protein composition from conformations. The approach yields clearer latent-space clustering, improved content–transformation transfer, and enables downstream analyses such as subtomogram averaging, highlighting its practical impact for scientific imaging where explicit transformation models are unavailable.

Abstract

Paper Structure (28 sections, 9 equations, 17 figures, 2 tables)

This paper contains 28 sections, 9 equations, 17 figures, 2 tables.

Introduction
Related Works
Method
Disentangling Content and Transformation
Method Overview and Notation
Variational Inference of Content and Transformation Codes
Creating Contrastive Pair with respect to Content
Creating Contrastive Pair with respect to Transformation
Experiments & Results
DualContrast disentangles several writing styles from MNIST images
DualContrast Disentangles ViewPoint from Linemod Object Dataset
DualContrast disentangles protein composition from conformations in Cryo-ET subtomograms and enables their precise identification
Discussions & Limitations
Conclusion
Acknowledgement
...and 13 more sections

Figures (17)

Figure 1: (a) The concept of content-transformation disentanglement, whereas changing the content changes the protein identity, and changing transformation changes the state of the protein. Explicit methods (b) use the transformation space to infer a fixed parameter set, whereas our implicit method (c) do not restrict the transformations to a fixed set of parameters. Figure uses toy protein images for visualization, they are not used for experiments in this form.
Figure 2: Our proposed contrastive learning-based unsupervised content-transformation disentanglement pipeline. (a) The variational inference of content and transformation codes with $L_\text{VAE}$. (b) Contrastive pair creation strategy for content and transformation codes. The process is delineated in the bottom panel. Additional visualization available in Appendix Fig. \ref{['fig:contrastive']}. (c) Contrastive losses. In DualContrast, the contrastive pair creations and reconstruction happen simultaneously, and the encoder and decoder network are optimized with both contrastive and reconstruction losses in each iteration.
Figure 3: Qualitative Results of Unsupervised $\textbf{c}$-$\textbf{z}$ Disentanglement on MNIST obtained by (a) Harmony, (b) SpatialVAE, (c) VITAE, and (d) DualContrast, respectively. Images are generated by the Decoders given content ($\textbf{c}$) code from the leftmost column images and transformation ($\textbf{z}$) code from the topmost row images.
Figure 4: Qualitative Results of Unsupervised $\textbf{c}$-$\textbf{z}$ Disentanglement on LineMod obtained by (a) Harmony, (b) SpatialVAE, (c) VITAE, and (d) DualContrast, respectively. Images are generated by the Decoders given $\textbf{c}$code from the leftmost column images and $\textbf{z}$code from the topmost row images. Additional content-transformation transfer results are available on Appendix Fig. \ref{['fig:supp:linemod']}.
Figure 5: Disentanglement of composition and conformations in cellular subtomogram dataset with slice-by-slice visualization of $4$ sample subtomograms. UMAP embedding of $\textbf{c}$codes in (a) SpatialVAE, (b) Harmony, and (c) DualContrast. (d) Slice-by-slice visualization of x-y slices in $4$ sample subtomograms. (e) UMAP embedding of $\textbf{c}$codes in Harmony trained only nucleosome subtomograms. (f) UMAP embedding of $\textbf{z}$codes in Harmony trained with all subtomograms.
...and 12 more figures

DualContrast: Unsupervised Disentangling of Content and Transformations with Implicit Parameterization

TL;DR

Abstract

DualContrast: Unsupervised Disentangling of Content and Transformations with Implicit Parameterization

Authors

TL;DR

Abstract

Table of Contents

Figures (17)