CoDeGAN: Contrastive Disentanglement for Generative Adversarial Network
Jiangwei Zhao, Zejia Liu, Xiaohan Guo, Lili Pan
TL;DR
CoDeGAN addresses discrete-factor disentanglement in GANs by replacing image-domain similarity with a feature-domain contrastive loss and by incorporating self-supervised pre-training to learn semantic representations. It introduces a class-related encoder $E_c$, an intra-class encoder $E_z$, and losses $\mathcal{L}_c$ and $\mathcal{L}_{\boldsymbol{z}}$, optimized through alternating updates of $D$, $G$, and $E$, achieving improved stability and disentanglement without heavy reliance on mutual information terms. Empirical results across MNIST, Fashion-MNIST, CIFAR-10, COIL-20, and 3D datasets show state-of-the-art disentanglement metrics (ACC, NMI, ARI) with competitive image quality (IS, FID), and self-supervised pre-training yields further gains, especially on challenging CIFAR-10. The findings suggest a practical and scalable path to unsupervised discrete-factor disentanglement in GANs, with robust performance and potential for extending to multi-factor disentanglement in future work.
Abstract
Disentanglement, a critical concern in interpretable machine learning, has also garnered significant attention from the computer vision community. Many existing GAN-based class disentanglement (unsupervised) approaches, such as InfoGAN and its variants, primarily aim to maximize the mutual information (MI) between the generated image and its latent codes. However, this focus may lead to a tendency for the network to generate highly similar images when presented with the same latent class factor, potentially resulting in mode collapse or mode dropping. To alleviate this problem, we propose \texttt{CoDeGAN} (Contrastive Disentanglement for Generative Adversarial Networks), where we relax similarity constraints for disentanglement from the image domain to the feature domain. This modification not only enhances the stability of GAN training but also improves their disentangling capabilities. Moreover, we integrate self-supervised pre-training into CoDeGAN to learn semantic representations, significantly facilitating unsupervised disentanglement. Extensive experimental results demonstrate the superiority of our method over state-of-the-art approaches across multiple benchmarks. The code is available at https://github.com/learninginvision/CoDeGAN.
