Table of Contents
Fetching ...

Independence Constrained Disentangled Representation Learning from Epistemological Perspective

Ruoyu Wang, Lina Yao

TL;DR

A two-level latent space framework is introduced to provide a general solution to the prior arguments on the interrelationships between latent variables and a novel method for disentangled representation learning is proposed by employing an integration of mutual information constraint and independence constraint within the Generative Adversarial Network (GAN) framework.

Abstract

Disentangled Representation Learning aims to improve the explainability of deep learning methods by training a data encoder that identifies semantically meaningful latent variables in the data generation process. Nevertheless, there is no consensus regarding a universally accepted definition for the objective of disentangled representation learning. In particular, there is a considerable amount of discourse regarding whether should the latent variables be mutually independent or not. In this paper, we first investigate these arguments on the interrelationships between latent variables by establishing a conceptual bridge between Epistemology and Disentangled Representation Learning. Then, inspired by these interdisciplinary concepts, we introduce a two-level latent space framework to provide a general solution to the prior arguments on this issue. Finally, we propose a novel method for disentangled representation learning by employing an integration of mutual information constraint and independence constraint within the Generative Adversarial Network (GAN) framework. Experimental results demonstrate that our proposed method consistently outperforms baseline approaches in both quantitative and qualitative evaluations. The method exhibits strong performance across multiple commonly used metrics and demonstrates a great capability in disentangling various semantic factors, leading to an improved quality of controllable generation, which consequently benefits the explainability of the algorithm.

Independence Constrained Disentangled Representation Learning from Epistemological Perspective

TL;DR

A two-level latent space framework is introduced to provide a general solution to the prior arguments on the interrelationships between latent variables and a novel method for disentangled representation learning is proposed by employing an integration of mutual information constraint and independence constraint within the Generative Adversarial Network (GAN) framework.

Abstract

Disentangled Representation Learning aims to improve the explainability of deep learning methods by training a data encoder that identifies semantically meaningful latent variables in the data generation process. Nevertheless, there is no consensus regarding a universally accepted definition for the objective of disentangled representation learning. In particular, there is a considerable amount of discourse regarding whether should the latent variables be mutually independent or not. In this paper, we first investigate these arguments on the interrelationships between latent variables by establishing a conceptual bridge between Epistemology and Disentangled Representation Learning. Then, inspired by these interdisciplinary concepts, we introduce a two-level latent space framework to provide a general solution to the prior arguments on this issue. Finally, we propose a novel method for disentangled representation learning by employing an integration of mutual information constraint and independence constraint within the Generative Adversarial Network (GAN) framework. Experimental results demonstrate that our proposed method consistently outperforms baseline approaches in both quantitative and qualitative evaluations. The method exhibits strong performance across multiple commonly used metrics and demonstrates a great capability in disentangling various semantic factors, leading to an improved quality of controllable generation, which consequently benefits the explainability of the algorithm.
Paper Structure (21 sections, 7 equations, 5 figures, 1 table, 1 algorithm)

This paper contains 21 sections, 7 equations, 5 figures, 1 table, 1 algorithm.

Figures (5)

  • Figure 1: Disentangled Representation Learning learns semantically meaningful latent variables such as Rotation and Digit. Our method outperforms existing methods by encouraging the factors to be further separated. For example, on MNIST: (a) Without the independence constraint in our method, the digit and width are affected when traversing on variable rotation; (b) With the independence constraint, digit and width are NOT affected when traversing on rotation.
  • Figure 2: (a) In epistemology, the complex idea apple is composed by simple ideas such as its colour, shape and size, which are irreducible and mutually independent; (b) Our proposed framework groups latent variables into two levels: 1) Atomic Level $Z_{A}$: comprises factors that are basic and irreducible such as colour and shape of an apple, where all factors are mutually independent; 2) Complex Level $Z_{C}$: comprises factors that are derived from the atomic level, such as the concept of apple. The two levels are connected by causal relationships.
  • Figure 3: Framework of TC-GAN. Two batches of latent variables $z_{1}, z_{2}$ are sampled from the atomic level latent space and passed to the generator $G$ to generate the fake images. The discriminator $D$ and the auxiliary network $Q$ share the same data encoder $E$. The network $Q$ outputs the predicted mean $\mu_{1}, \mu_{2}$ and variance $\sigma_{1}, \sigma_{2}$ of the latent factor. Then, we employ the re-parameterization trick and the permute-dim algorithm to aid the $TCD$ in estimating Total Correlation.
  • Figure 4: dSprites Samples for comparison. Both models are trained with 5 continuous variables, and 5 noise variables. (a) Without independence constraint, the shape is affected by traversing on rotation and Rotation is affected by traversing on PosX; (b) With independence constraint, these factors are unaffected by traversing on another variable.
  • Figure 5: FashionMNIST Samples for comparison. Both models are trained with 1 ten-dimensional discrete variable, 1 continuous variable, and 62 noise variables. (a) Without constraining on the independence between variables, item type is affected by traversing on thickness. (b) With the constraint on the independence between variables, item type is unaffected while traversing on thickness.