From Core to Detail: Unsupervised Disentanglement with Entropy-Ordered Flows
Daniel Galperin, Ullrich Köthe
TL;DR
EOFlows address unsupervised disentanglement by ordering latent coordinates via explained entropy and introducing a Maximum Manifold Likelihood objective that jointly trains density estimation, manifold learning, and disentanglement. The core–detail decomposition yields a flexible, inference-time bottleneck (top $C$ core dimensions) and a tractable stochastic estimator for the total disentanglement term, enabling scalable training on high-dimensional data. In experiments across EMNIST, Entangled Digits, and CelebA, EOFlows produce interpretable, semantically meaningful latent factors, enable strong compression and denoising, and exhibit a PCA-like solution in the linear Gaussian limit while maintaining non-linear expressiveness. The work demonstrates a principled, information-theoretic pathway to learn stable, disentangled, and compressible representations with practical benefits for visualization, editing, and rate–distortion trade-offs in generative modeling.
Abstract
Learning unsupervised representations that are both semantically meaningful and stable across runs remains a central challenge in modern representation learning. We introduce entropy-ordered flows (EOFlows), a normalizing-flow framework that orders latent dimensions by their explained entropy, analogously to PCA's explained variance. This ordering enables adaptive injective flows: after training, one may retain only the top C latent variables to form a compact core representation while the remaining variables capture fine-grained detail and noise, with C chosen flexibly at inference time rather than fixed during training. EOFlows build on insights from Independent Mechanism Analysis, Principal Component Flows and Manifold Entropic Metrics. We combine likelihood-based training with local Jacobian regularization and noise augmentation into a method that scales well to high-dimensional data such as images. Experiments on the CelebA dataset show that our method uncovers a rich set of semantically interpretable features, allowing for high compression and strong denoising.
