Table of Contents
Fetching ...

Disentanglement in Difference: Directly Learning Semantically Disentangled Representations by Maximizing Inter-Factor Differences

Xingshen Zhang, Lin Wang, Shuangrong Liu, Xintao Lu, Chaoran Pang, Bo Yang

TL;DR

The paper addresses the limitation that statistical independence does not guarantee semantic disentanglement in latent representations. It introduces Disentanglement in Difference (DiD), a framework that directly learns semantic differences between latent factors using a Difference Encoder and a contrastive objective, maximizing the distance between factor-induced representations. The architecture combines a Sample Generation Paradigm, a Difference Encoder, and a Samples Encoder, with losses L_G, L_H, and L_enc under a WGAN-GP training regime. Empirical results on dSprites and 3DShapes show state-of-the-art disentanglement across MIG, DCI-D, and SAP, and ablations confirm the necessity and effectiveness of learning semantic differences.

Abstract

In this study, Disentanglement in Difference(DiD) is proposed to address the inherent inconsistency between the statistical independence of latent variables and the goal of semantic disentanglement in disentanglement representation learning. Conventional disentanglement methods achieve disentanglement representation by improving statistical independence among latent variables. However, the statistical independence of latent variables does not necessarily imply that they are semantically unrelated, thus, improving statistical independence does not always enhance disentanglement performance. To address the above issue, DiD is proposed to directly learn semantic differences rather than the statistical independence of latent variables. In the DiD, a Difference Encoder is designed to measure the semantic differences; a contrastive loss function is established to facilitate inter-dimensional comparison. Both of them allow the model to directly differentiate and disentangle distinct semantic factors, thereby resolving the inconsistency between statistical independence and semantic disentanglement. Experimental results on the dSprites and 3DShapes datasets demonstrate that the proposed DiD outperforms existing mainstream methods across various disentanglement metrics.

Disentanglement in Difference: Directly Learning Semantically Disentangled Representations by Maximizing Inter-Factor Differences

TL;DR

The paper addresses the limitation that statistical independence does not guarantee semantic disentanglement in latent representations. It introduces Disentanglement in Difference (DiD), a framework that directly learns semantic differences between latent factors using a Difference Encoder and a contrastive objective, maximizing the distance between factor-induced representations. The architecture combines a Sample Generation Paradigm, a Difference Encoder, and a Samples Encoder, with losses L_G, L_H, and L_enc under a WGAN-GP training regime. Empirical results on dSprites and 3DShapes show state-of-the-art disentanglement across MIG, DCI-D, and SAP, and ablations confirm the necessity and effectiveness of learning semantic differences.

Abstract

In this study, Disentanglement in Difference(DiD) is proposed to address the inherent inconsistency between the statistical independence of latent variables and the goal of semantic disentanglement in disentanglement representation learning. Conventional disentanglement methods achieve disentanglement representation by improving statistical independence among latent variables. However, the statistical independence of latent variables does not necessarily imply that they are semantically unrelated, thus, improving statistical independence does not always enhance disentanglement performance. To address the above issue, DiD is proposed to directly learn semantic differences rather than the statistical independence of latent variables. In the DiD, a Difference Encoder is designed to measure the semantic differences; a contrastive loss function is established to facilitate inter-dimensional comparison. Both of them allow the model to directly differentiate and disentangle distinct semantic factors, thereby resolving the inconsistency between statistical independence and semantic disentanglement. Experimental results on the dSprites and 3DShapes datasets demonstrate that the proposed DiD outperforms existing mainstream methods across various disentanglement metrics.

Paper Structure

This paper contains 18 sections, 6 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: The framework of the proposed DiD. The blue points denote $\mathbf{c}$ sampled from the uniform distribution, the red points represent samples obtained from two orthogonal axes.
  • Figure 2: Relationship between Total Correlation (TC) and Mutual Information Gap (MIG). The scatter plots illustrate that a lower Total Correlation does not consistently guarantee improved disentanglement performance.
  • Figure 3: In this ablation study, model performance was investigated by varying the number of dimensions used for difference comparison. The x-axis indicates the count of dimensions selected from the total number of dimensions for comparison.