Table of Contents
Fetching ...

CFASL: Composite Factor-Aligned Symmetry Learning for Disentanglement in Variational AutoEncoder

Hee-Jun Jung, Jaehyoung Jeong, Kangil Kim

Abstract

Symmetries of input and latent vectors have provided valuable insights for disentanglement learning in VAEs. However, only a few works were proposed as an unsupervised method, and even these works require known factor information in the training data. We propose a novel method, Composite Factor-Aligned Symmetry Learning (CFASL), which is integrated into VAEs for learning symmetry-based disentanglement in unsupervised learning without any knowledge of the dataset factor information. CFASL incorporates three novel features for learning symmetry-based disentanglement: 1) Injecting inductive bias to align latent vector dimensions to factor-aligned symmetries within an explicit learnable symmetry code-book 2) Learning a composite symmetry to express unknown factors change between two random samples by learning factor-aligned symmetries within the codebook 3) Inducing a group equivariant encoder and decoder in training VAEs with the two conditions. In addition, we propose an extended evaluation metric for multi-factor changes in comparison to disentanglement evaluation in VAEs. In quantitative and in-depth qualitative analysis, CFASL demonstrates a significant improvement of disentanglement in single-factor change, and multi-factor change conditions compared to state-of-the-art methods.

CFASL: Composite Factor-Aligned Symmetry Learning for Disentanglement in Variational AutoEncoder

Abstract

Symmetries of input and latent vectors have provided valuable insights for disentanglement learning in VAEs. However, only a few works were proposed as an unsupervised method, and even these works require known factor information in the training data. We propose a novel method, Composite Factor-Aligned Symmetry Learning (CFASL), which is integrated into VAEs for learning symmetry-based disentanglement in unsupervised learning without any knowledge of the dataset factor information. CFASL incorporates three novel features for learning symmetry-based disentanglement: 1) Injecting inductive bias to align latent vector dimensions to factor-aligned symmetries within an explicit learnable symmetry code-book 2) Learning a composite symmetry to express unknown factors change between two random samples by learning factor-aligned symmetries within the codebook 3) Inducing a group equivariant encoder and decoder in training VAEs with the two conditions. In addition, we propose an extended evaluation metric for multi-factor changes in comparison to disentanglement evaluation in VAEs. In quantitative and in-depth qualitative analysis, CFASL demonstrates a significant improvement of disentanglement in single-factor change, and multi-factor change conditions compared to state-of-the-art methods.
Paper Structure (72 sections, 14 equations, 14 figures, 10 tables)

This paper contains 72 sections, 14 equations, 14 figures, 10 tables.

Figures (14)

  • Figure 1: Distribution of latent vectors for dimensions responsible for Shape, X-pos, and Y-pos factors in the dSprites dataset. The groupified-VAE method is applied to $\beta$-TCVAE because this model shows a better evaluation score. The results show disentanglement for shape from the combination of the other two factors by coloring three shapes (square, ellipse, and heart) as red, blue, and green color, respectively. Each 3D plot shows the whole distribution. We fix Scale and Orientation factor values, and plot randomly sampled 640 inputs (20.8$\%$ of all possible observations ($32\times 32\times 3 = 3,072$)). We select the dimensions responsible for the factors by selecting the largest value of the Kullback-Leibler divergence between the prior and the posterior. Cont.-VAE is a Control-VAE.
  • Figure 2: The overall architecture of the proposed method. $\longleftrightarrow$ refers to a loss function. 1) A pair of images (e.g., differences between two images are in the x- and y-position) is given, and the goal of the model is to represent the differences on the latent vector space, called the composite symmetry$g_c$ for disentangled representations. 2) The codebook is designed to represent the composite symmetry $g_c$. 3) Each section of the codebook is separated to affect a single factor e.g., the $i^{th}$ section affects the x-position, and the $j^{th}$ section affects the y-position of images. 4) Each section consists of Lie algebra to provide diversity of symmetries. 5) As shown in (b), each loss optimizes the codebook to guarantee the 3) as follows: i) symmetries from the same section affect the same factor through the parallel loss $\mathcal{L}_{pl}$ (e.g., symmetries from $i^{th}$ section $exp(\mathfrak{g}^i_k)$ only affects the x-position), ii) each section affects different factors by the perpendicular loss $\mathcal{L}_{pd}$ (e.g., symmetry from $i^{th}$ and $j^{th}$ section $g_c^i$ and $g_c^j$ affect x-position and y-position respectively), and iii) each section changes a single dimension of latent vectors for disentangled representation by the sparsity loss $\mathcal{L}_s$. 6) $attn$ ensures diversity in symmetries representation, and $p_s$ predicts the activated section, in this case, $i^{th}$ and $j^{th}$ sections for x- and y-position differences. 7) The model then represents the composite symmetry $g_c$. 8) Lastly, model optimizes the $\mathcal{L}_{ee}$ to match $g_c z_1$($=z^\prime_2$) and $z_2$, and $\mathcal{L}_{de}$ to match the $x_1$ and $p_\theta (g_c \circ q_\phi(x_2))$ to inject the inductive bias.
  • Figure 3: Roles of parallel, perpendicular, and sparsity loss on symmetries in the codebook for adjusting representation change. Parallel loss is for symmetries of the same section, and perpendicular loss is for different sections. Each axis (x and y) only affects a single factor.
  • Figure 4: Loss curves: 1) HT: hyper-parameter tuning ($\epsilon \in \{0.01, 0.1, 1.0 \}$) with $\beta$-TCVAE based CFASL. 2) AB: ablation study with $\beta$-VAE based CFASL.
  • Figure 5: Heatmaps of Eigenvectors for latent vector representations.
  • ...and 9 more figures