Table of Contents
Fetching ...

A Revisit of Total Correlation in Disentangled Variational Auto-Encoder with Partial Disentanglement

Chengrui Li, Yunmiao Wang, Yule Wang, Weihan Li, Dieter Jaeger, Anqi Wu

TL;DR

PDisVAE introduces a flexible, partially disentangled variational auto-encoder that replaces the full- independence TC penalty with a partial correlation (PC) penalty to enforce group-wise independence while allowing within-group entanglement. By unifying the grouping parameter G with latent dimensionality K, PDisVAE smoothly interpolates between standard VAEs and fully disentangled VAEs, and it naturally accommodates rank deficiencies within groups. The authors derive an optimal importance-sampling batch approximation for estimating the PC term, and validate the approach on synthetic, partial-dsprites, CelebA, and neural data, showing improved recovery of group structure and richer, more interpretable representations than fully disentangled methods. The framework offers a practical and versatile tool for learning latent representations that honor realistic, group-wise independence structures in complex data. This has broad implications for applications in computer vision and neuroscience where factors of variation exhibit partial, rather than complete, independence.

Abstract

A fully disentangled variational auto-encoder (VAE) aims to identify disentangled latent components from observations. However, enforcing full independence between all latent components may be too strict for certain datasets. In some cases, multiple factors may be entangled together in a non-separable manner, or a single independent semantic meaning could be represented by multiple latent components within a higher-dimensional manifold. To address such scenarios with greater flexibility, we develop the Partially Disentangled VAE (PDisVAE), which generalizes the total correlation (TC) term in fully disentangled VAEs to a partial correlation (PC) term. This framework can handle group-wise independence and can naturally reduce to either the standard VAE or the fully disentangled VAE. Validation through three synthetic experiments demonstrates the correctness and practicality of PDisVAE. When applied to real-world datasets, PDisVAE discovers valuable information that is difficult to find using fully disentangled VAEs, implying its versatility and effectiveness.

A Revisit of Total Correlation in Disentangled Variational Auto-Encoder with Partial Disentanglement

TL;DR

PDisVAE introduces a flexible, partially disentangled variational auto-encoder that replaces the full- independence TC penalty with a partial correlation (PC) penalty to enforce group-wise independence while allowing within-group entanglement. By unifying the grouping parameter G with latent dimensionality K, PDisVAE smoothly interpolates between standard VAEs and fully disentangled VAEs, and it naturally accommodates rank deficiencies within groups. The authors derive an optimal importance-sampling batch approximation for estimating the PC term, and validate the approach on synthetic, partial-dsprites, CelebA, and neural data, showing improved recovery of group structure and richer, more interpretable representations than fully disentangled methods. The framework offers a practical and versatile tool for learning latent representations that honor realistic, group-wise independence structures in complex data. This has broad implications for applications in computer vision and neuroscience where factors of variation exhibit partial, rather than complete, independence.

Abstract

A fully disentangled variational auto-encoder (VAE) aims to identify disentangled latent components from observations. However, enforcing full independence between all latent components may be too strict for certain datasets. In some cases, multiple factors may be entangled together in a non-separable manner, or a single independent semantic meaning could be represented by multiple latent components within a higher-dimensional manifold. To address such scenarios with greater flexibility, we develop the Partially Disentangled VAE (PDisVAE), which generalizes the total correlation (TC) term in fully disentangled VAEs to a partial correlation (PC) term. This framework can handle group-wise independence and can naturally reduce to either the standard VAE or the fully disentangled VAE. Validation through three synthetic experiments demonstrates the correctness and practicality of PDisVAE. When applied to real-world datasets, PDisVAE discovers valuable information that is difficult to find using fully disentangled VAEs, implying its versatility and effectiveness.

Paper Structure

This paper contains 42 sections, 1 theorem, 19 equations, 14 figures, 5 tables.

Key Result

Theorem 1.1

$(x_1, \dots, x_I) \perp (y_1, \dots, y_J) \iff (f(x_1,\dots, x_I) \perp g(y_1, \dots, y_J)$$\forall$ measurable functions $f$ and $g)$.

Figures (14)

  • Figure 1: Visual illustrations for the desired behavior of the PDisVAE. In each case, the left plot is the estimated latent $(\hat{z}_i, \hat{z}_j)$ and the right plot is the true latent $(z_i, z_j)$.
  • Figure 2: (a): The true latent $\bm z \in \mathbb{R}^6$ where three groups are $(z_1,z_2)\perp(z_3,z_4)\perp(z_5,z_6)$, but within-groups are highly entangled (top row). Latent components in different groups are marginally independent (bottom row). (b): The PC of the estimated latent and the latent $R^2$ after alignment to the true latent in (a), with pair-wise $t$-test showing the significance level (***: $p\leqslant 0.001$, ****: $p\leqslant 0.0001$). (c): The estimated latent of PDisVAE before aligning to the true latent in (a). In each group, PCA shows the explained variance ratio in the group. Within-group TC shows the minimum TC under all possible linear transformations. The normal test shows the $p$-values of the null hypothesis that a marginal distribution is a normal distribution. If $p>0.05$ for example, we may accept the null hypothesis that there exists a Gaussian noise dummy latent component. The pair TC is directly measured from the components in different groups.
  • Figure 3: Estimated latent after aligning to the true latent (Fig.\ref{['fig:synthetic_weak_box']}(a)) for various methods. Left three columns: the three independent groups; right one column: a between-group component pair. VAE and ICA results are in Fig. \ref{['fig:synthetic_weak_latent_all']} in Appendix. \ref{['appendix:supplimentary_results']}.
  • Figure 4: (a): Latent and observation generating process. Locations $(z_1, z_2)$ are entangled, and uniformly distributed in a restricted region. Color represents the location information, with the upper and lower gray triangular areas being empty. The size $z_3$ is evenly distributed across five scales, represented by different markers, and is independent of the location. (b): The reconstructed images by varying one of the latent groups ($(\hat{z}_1, \hat{z}_2)$ or $(\hat{z}_3, \hat{z}_4)$) found by $\beta$-TCVAE and PDisVAE.
  • Figure 5: The latent plot after alignment for the group 1 $(z_1, z_2)$ and group 2 $(z_3, z_4 \approx 0)$ from different methods, and their corresponding PC and latent $R^2$. The color representation for location is the same as the color representation in Fig. \ref{['fig:pdsprites']}(a), and the marker of the point in the latent plots represents the size of the square in the observation images.
  • ...and 9 more figures

Theorems & Definitions (2)

  • Theorem 1.1
  • proof