Disentanglement in Difference: Directly Learning Semantically Disentangled Representations by Maximizing Inter-Factor Differences
Xingshen Zhang, Lin Wang, Shuangrong Liu, Xintao Lu, Chaoran Pang, Bo Yang
TL;DR
The paper addresses the limitation that statistical independence does not guarantee semantic disentanglement in latent representations. It introduces Disentanglement in Difference (DiD), a framework that directly learns semantic differences between latent factors using a Difference Encoder and a contrastive objective, maximizing the distance between factor-induced representations. The architecture combines a Sample Generation Paradigm, a Difference Encoder, and a Samples Encoder, with losses L_G, L_H, and L_enc under a WGAN-GP training regime. Empirical results on dSprites and 3DShapes show state-of-the-art disentanglement across MIG, DCI-D, and SAP, and ablations confirm the necessity and effectiveness of learning semantic differences.
Abstract
In this study, Disentanglement in Difference(DiD) is proposed to address the inherent inconsistency between the statistical independence of latent variables and the goal of semantic disentanglement in disentanglement representation learning. Conventional disentanglement methods achieve disentanglement representation by improving statistical independence among latent variables. However, the statistical independence of latent variables does not necessarily imply that they are semantically unrelated, thus, improving statistical independence does not always enhance disentanglement performance. To address the above issue, DiD is proposed to directly learn semantic differences rather than the statistical independence of latent variables. In the DiD, a Difference Encoder is designed to measure the semantic differences; a contrastive loss function is established to facilitate inter-dimensional comparison. Both of them allow the model to directly differentiate and disentangle distinct semantic factors, thereby resolving the inconsistency between statistical independence and semantic disentanglement. Experimental results on the dSprites and 3DShapes datasets demonstrate that the proposed DiD outperforms existing mainstream methods across various disentanglement metrics.
