Table of Contents
Fetching ...

From Core to Detail: Unsupervised Disentanglement with Entropy-Ordered Flows

Daniel Galperin, Ullrich Köthe

TL;DR

EOFlows address unsupervised disentanglement by ordering latent coordinates via explained entropy and introducing a Maximum Manifold Likelihood objective that jointly trains density estimation, manifold learning, and disentanglement. The core–detail decomposition yields a flexible, inference-time bottleneck (top $C$ core dimensions) and a tractable stochastic estimator for the total disentanglement term, enabling scalable training on high-dimensional data. In experiments across EMNIST, Entangled Digits, and CelebA, EOFlows produce interpretable, semantically meaningful latent factors, enable strong compression and denoising, and exhibit a PCA-like solution in the linear Gaussian limit while maintaining non-linear expressiveness. The work demonstrates a principled, information-theoretic pathway to learn stable, disentangled, and compressible representations with practical benefits for visualization, editing, and rate–distortion trade-offs in generative modeling.

Abstract

Learning unsupervised representations that are both semantically meaningful and stable across runs remains a central challenge in modern representation learning. We introduce entropy-ordered flows (EOFlows), a normalizing-flow framework that orders latent dimensions by their explained entropy, analogously to PCA's explained variance. This ordering enables adaptive injective flows: after training, one may retain only the top C latent variables to form a compact core representation while the remaining variables capture fine-grained detail and noise, with C chosen flexibly at inference time rather than fixed during training. EOFlows build on insights from Independent Mechanism Analysis, Principal Component Flows and Manifold Entropic Metrics. We combine likelihood-based training with local Jacobian regularization and noise augmentation into a method that scales well to high-dimensional data such as images. Experiments on the CelebA dataset show that our method uncovers a rich set of semantically interpretable features, allowing for high compression and strong denoising.

From Core to Detail: Unsupervised Disentanglement with Entropy-Ordered Flows

TL;DR

EOFlows address unsupervised disentanglement by ordering latent coordinates via explained entropy and introducing a Maximum Manifold Likelihood objective that jointly trains density estimation, manifold learning, and disentanglement. The core–detail decomposition yields a flexible, inference-time bottleneck (top core dimensions) and a tractable stochastic estimator for the total disentanglement term, enabling scalable training on high-dimensional data. In experiments across EMNIST, Entangled Digits, and CelebA, EOFlows produce interpretable, semantically meaningful latent factors, enable strong compression and denoising, and exhibit a PCA-like solution in the linear Gaussian limit while maintaining non-linear expressiveness. The work demonstrates a principled, information-theoretic pathway to learn stable, disentangled, and compressible representations with practical benefits for visualization, editing, and rate–distortion trade-offs in generative modeling.

Abstract

Learning unsupervised representations that are both semantically meaningful and stable across runs remains a central challenge in modern representation learning. We introduce entropy-ordered flows (EOFlows), a normalizing-flow framework that orders latent dimensions by their explained entropy, analogously to PCA's explained variance. This ordering enables adaptive injective flows: after training, one may retain only the top C latent variables to form a compact core representation while the remaining variables capture fine-grained detail and noise, with C chosen flexibly at inference time rather than fixed during training. EOFlows build on insights from Independent Mechanism Analysis, Principal Component Flows and Manifold Entropic Metrics. We combine likelihood-based training with local Jacobian regularization and noise augmentation into a method that scales well to high-dimensional data such as images. Experiments on the CelebA dataset show that our method uncovers a rich set of semantically interpretable features, allowing for high compression and strong denoising.
Paper Structure (61 sections, 8 theorems, 136 equations, 27 figures, 1 table)

This paper contains 61 sections, 8 theorems, 136 equations, 27 figures, 1 table.

Key Result

Theorem 2.1

Assume a core-detail split of the latent space ${\boldsymbol{z}_{{\mathbb{}}{\overline{\mathbb{}}}}}= [{\boldsymbol{z}_{{\mathbb{}}{\overline{\mathbb{}}}}}_{\mathbb{C}}, {\boldsymbol{z}_{{\mathbb{}}{\overline{\mathbb{}}}}}_{\mathbb{D}}]$ and the following decoder where $\hat{g}: \mathbb{R}^{|{\mathbb{C}}|} \rightarrow \mathbb{R}^D$ is a (unrestricted) injective mapping, ${\boldsymbol{U}_{{\mathbb

Figures (27)

  • Figure 1: Depiction of different push-forwards of a latent standard normal (left) through linear (center) and non-linear (right) decoders, inducing affine and curvilinear coordinates in the data space respectively. Line width and color indicate the conditional densities along the coordinate lines in rows 2 and 3. Row 4 shows the pointwise manifold mutual information between ${\boldsymbol{U}_{{\mathbb{C}}{\overline{\mathbb{}}}}}$ and ${\boldsymbol{U}_{{\mathbb{D}}{\overline{\mathbb{}}}}}$, defined as ${\mathcal{L}_{{\mathbb{C \perp D}}{\overline{\mathbb{}}}}}= \log({{q}_{{\mathbb{}}{\overline{\mathbb{}}}}}/({{q}_{{\mathbb{C}}{\overline{\mathbb{}}}}}\cdot{{q}_{{\mathbb{D}}{\overline{\mathbb{}}}}})) \geq 0$, which is a non-linear generalization of the classical correlation. Linear decoders can only generate Gaussian distributions, and the PCA-solution induces orthogonal coordinates such that ${{q}_{{\mathbb{}}{\overline{\mathbb{}}}}}$ factorizes exactly into ${{q}_{{\mathbb{C}}{\overline{\mathbb{}}}}}$ and ${{q}_{{\mathbb{D}}{\overline{\mathbb{}}}}}$. EOFlows extend this to non-Gaussian distributions and non-linear mappings. When the decoder Jacobian has orthogonal columns everywhere, the resulting curvilinear coordinates become orthogonal, resembling polar coordinates in this example. Only in the orthogonal cases, ${\mathcal{L}_{{\mathbb{C \perp D}}{\overline{\mathbb{}}}}}$ vanishes everywhere and the constituent densities factorize "cleanly" as $q = {{q}_{{\mathbb{C}}{\overline{\mathbb{}}}}}\cdot {{q}_{{\mathbb{D}}{\overline{\mathbb{}}}}}$ (purple in the bottom row).
  • Figure 2: Multiple EOFlows trained on EMNIST with varying noise level $\sigma_{{\boldsymbol{\epsilon}}} \in \{0.01, 0.03, 0.1\}$ and varying Total Disentanglement strength $\lambda_\text{TC} \in [0, 10]$.
  • Figure 3: Average Jacobian column vectors $\mathop{\mathrm{\mathbb{E}}}\nolimits\displaylimits_{}\left[{\boldsymbol{J}_{{\mathbb{}}{\overline{\mathbb{}}}}}_i\right]$ of selected latent dimensions, where each image is normalized for increased contrast. High entropy dimensions (top row) are responsible for global factors such as slant, thickness, etc. Low entropy dimensions (bottom row) indicate preprocessing artifacts in EMNIST.
  • Figure 4: Disentangling entangled digits. Left block shows 4 random samples (left to right) of inflated data samples, right block mirrors the left block with a latent bottleneck of $C = 100$. Top row: Entangled data samples of $\alpha=0.5$. Second/third row: The most important latent dimension is edited to the value $+2$/$-2$. Bottom row: Superposition of 2nd and 3rd row, which closely resembles the original 1st row.
  • Figure 5: Manifold entropy spectra of multiple EOFlows trained on CelebA with $\sigma_{{\boldsymbol{\epsilon}}}=0.1$ and varying $\lambda_\text{TC}\in\{0,0.01,0.1,1.0\}$, where we additionally plot the PCA-solution as a linearized EOFlow. The noise level becomes a lower bound on the manifold entropy and allows to specify a natural cutoff between core and detail dimensions.
  • ...and 22 more figures

Theorems & Definitions (25)

  • Definition 1.1: Curvilinear coordinates
  • Definition 1.2: Curvilinear manifold
  • Definition 1.3: Manifold PDF
  • Definition 1.4: Pointwise Manifold Entropy
  • Definition 1.5: Pointwise Manifold Mutual Information
  • Definition 1.6: Maximum Manifold Likelihood objective
  • Definition 1.8
  • Definition 1.9
  • Definition 1.10
  • Theorem 2.1
  • ...and 15 more