From Core to Detail: Unsupervised Disentanglement with Entropy-Ordered Flows

Daniel Galperin; Ullrich Köthe

From Core to Detail: Unsupervised Disentanglement with Entropy-Ordered Flows

Daniel Galperin, Ullrich Köthe

TL;DR

EOFlows address unsupervised disentanglement by ordering latent coordinates via explained entropy and introducing a Maximum Manifold Likelihood objective that jointly trains density estimation, manifold learning, and disentanglement. The core–detail decomposition yields a flexible, inference-time bottleneck (top $C$ core dimensions) and a tractable stochastic estimator for the total disentanglement term, enabling scalable training on high-dimensional data. In experiments across EMNIST, Entangled Digits, and CelebA, EOFlows produce interpretable, semantically meaningful latent factors, enable strong compression and denoising, and exhibit a PCA-like solution in the linear Gaussian limit while maintaining non-linear expressiveness. The work demonstrates a principled, information-theoretic pathway to learn stable, disentangled, and compressible representations with practical benefits for visualization, editing, and rate–distortion trade-offs in generative modeling.

Abstract

Learning unsupervised representations that are both semantically meaningful and stable across runs remains a central challenge in modern representation learning. We introduce entropy-ordered flows (EOFlows), a normalizing-flow framework that orders latent dimensions by their explained entropy, analogously to PCA's explained variance. This ordering enables adaptive injective flows: after training, one may retain only the top C latent variables to form a compact core representation while the remaining variables capture fine-grained detail and noise, with C chosen flexibly at inference time rather than fixed during training. EOFlows build on insights from Independent Mechanism Analysis, Principal Component Flows and Manifold Entropic Metrics. We combine likelihood-based training with local Jacobian regularization and noise augmentation into a method that scales well to high-dimensional data such as images. Experiments on the CelebA dataset show that our method uncovers a rich set of semantically interpretable features, allowing for high compression and strong denoising.

From Core to Detail: Unsupervised Disentanglement with Entropy-Ordered Flows

TL;DR

core dimensions) and a tractable stochastic estimator for the total disentanglement term, enabling scalable training on high-dimensional data. In experiments across EMNIST, Entangled Digits, and CelebA, EOFlows produce interpretable, semantically meaningful latent factors, enable strong compression and denoising, and exhibit a PCA-like solution in the linear Gaussian limit while maintaining non-linear expressiveness. The work demonstrates a principled, information-theoretic pathway to learn stable, disentangled, and compressible representations with practical benefits for visualization, editing, and rate–distortion trade-offs in generative modeling.

Abstract

Paper Structure (61 sections, 8 theorems, 136 equations, 27 figures, 1 table)

This paper contains 61 sections, 8 theorems, 136 equations, 27 figures, 1 table.

Introduction
Related Work
Method
Normalizing Flows
Derivation of the MML training objective
Entropy ordering and disentanglement
Important special cases
Experiments
Inflation approach
Numerical estimate of Total Disentanglement
EMNIST
Entangled Digits Dataset
CelebA
Conclusion
Derivation
...and 46 more sections

Key Result

Theorem 2.1

Assume a core-detail split of the latent space ${\boldsymbol{z}_{{\mathbb{}}{\overline{\mathbb{}}}}}= [{\boldsymbol{z}_{{\mathbb{}}{\overline{\mathbb{}}}}}_{\mathbb{C}}, {\boldsymbol{z}_{{\mathbb{}}{\overline{\mathbb{}}}}}_{\mathbb{D}}]$ and the following decoder where $\hat{g}: \mathbb{R}^{|{\mathbb{C}}|} \rightarrow \mathbb{R}^D$ is a (unrestricted) injective mapping, ${\boldsymbol{U}_{{\mathbb

Figures (27)

Figure 1: Depiction of different push-forwards of a latent standard normal (left) through linear (center) and non-linear (right) decoders, inducing affine and curvilinear coordinates in the data space respectively. Line width and color indicate the conditional densities along the coordinate lines in rows 2 and 3. Row 4 shows the pointwise manifold mutual information between ${\boldsymbol{U}_{{\mathbb{C}}{\overline{\mathbb{}}}}}$ and ${\boldsymbol{U}_{{\mathbb{D}}{\overline{\mathbb{}}}}}$, defined as ${\mathcal{L}_{{\mathbb{C \perp D}}{\overline{\mathbb{}}}}}= \log({{q}_{{\mathbb{}}{\overline{\mathbb{}}}}}/({{q}_{{\mathbb{C}}{\overline{\mathbb{}}}}}\cdot{{q}_{{\mathbb{D}}{\overline{\mathbb{}}}}})) \geq 0$, which is a non-linear generalization of the classical correlation. Linear decoders can only generate Gaussian distributions, and the PCA-solution induces orthogonal coordinates such that ${{q}_{{\mathbb{}}{\overline{\mathbb{}}}}}$ factorizes exactly into ${{q}_{{\mathbb{C}}{\overline{\mathbb{}}}}}$ and ${{q}_{{\mathbb{D}}{\overline{\mathbb{}}}}}$. EOFlows extend this to non-Gaussian distributions and non-linear mappings. When the decoder Jacobian has orthogonal columns everywhere, the resulting curvilinear coordinates become orthogonal, resembling polar coordinates in this example. Only in the orthogonal cases, ${\mathcal{L}_{{\mathbb{C \perp D}}{\overline{\mathbb{}}}}}$ vanishes everywhere and the constituent densities factorize "cleanly" as $q = {{q}_{{\mathbb{C}}{\overline{\mathbb{}}}}}\cdot {{q}_{{\mathbb{D}}{\overline{\mathbb{}}}}}$ (purple in the bottom row).
Figure 2: Multiple EOFlows trained on EMNIST with varying noise level $\sigma_{{\boldsymbol{\epsilon}}} \in \{0.01, 0.03, 0.1\}$ and varying Total Disentanglement strength $\lambda_\text{TC} \in [0, 10]$.
Figure 3: Average Jacobian column vectors $\mathop{\mathrm{\mathbb{E}}}\nolimits\displaylimits_{}\left[{\boldsymbol{J}_{{\mathbb{}}{\overline{\mathbb{}}}}}_i\right]$ of selected latent dimensions, where each image is normalized for increased contrast. High entropy dimensions (top row) are responsible for global factors such as slant, thickness, etc. Low entropy dimensions (bottom row) indicate preprocessing artifacts in EMNIST.
Figure 4: Disentangling entangled digits. Left block shows 4 random samples (left to right) of inflated data samples, right block mirrors the left block with a latent bottleneck of $C = 100$. Top row: Entangled data samples of $\alpha=0.5$. Second/third row: The most important latent dimension is edited to the value $+2$/$-2$. Bottom row: Superposition of 2nd and 3rd row, which closely resembles the original 1st row.
Figure 5: Manifold entropy spectra of multiple EOFlows trained on CelebA with $\sigma_{{\boldsymbol{\epsilon}}}=0.1$ and varying $\lambda_\text{TC}\in\{0,0.01,0.1,1.0\}$, where we additionally plot the PCA-solution as a linearized EOFlow. The noise level becomes a lower bound on the manifold entropy and allows to specify a natural cutoff between core and detail dimensions.
...and 22 more figures

Theorems & Definitions (25)

Definition 1.1: Curvilinear coordinates
Definition 1.2: Curvilinear manifold
Definition 1.3: Manifold PDF
Definition 1.4: Pointwise Manifold Entropy
Definition 1.5: Pointwise Manifold Mutual Information
Definition 1.6: Maximum Manifold Likelihood objective
Definition 1.8
Definition 1.9
Definition 1.10
Theorem 2.1
...and 15 more

From Core to Detail: Unsupervised Disentanglement with Entropy-Ordered Flows

TL;DR

Abstract

From Core to Detail: Unsupervised Disentanglement with Entropy-Ordered Flows

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (27)

Theorems & Definitions (25)