Distributional Autoencoders Know the Score
Andrej Leban
TL;DR
The paper presents the Distributional Principal Autoencoder (DPA), which simultaneously learns a data distribution and a nonlinear, PCA-like manifold representation with an exact score–geometry identity linking encoder level-set normals to the data score $s_{ ext{data}}(y)=\nabla_y \log P_{ ext{data}}(y)$. It proves that, when data lie on a parameterizable manifold, extraneous latent coordinates beyond the intrinsic dimension are uninformative given the informative coordinates, enabling intrinsic-dimension discovery via conditional independence. The authors validate these results with experiments on Normal, Gaussian mixtures, and the Müller–Brown potential, showing close alignment between level-set normals and the data score and demonstrating MFEP recovery from a single, unsupervised fit. This work unifies distribution learning and dimensionality reduction under exact guarantees, with practical implications for efficient molecular simulations and nonlinear manifold learning.
Abstract
The Distributional Principal Autoencoder (DPA) combines distributionally correct reconstruction with principal-component-like interpretability of the encodings. In this work, we provide exact theoretical guarantees on both fronts. First, we derive a closed-form relation linking each optimal level-set geometry to the data-distribution score. This result explains DPA's empirical ability to disentangle factors of variation of the data, as well as allows the score to be recovered directly from samples. When the data follows the Boltzmann distribution, we demonstrate that this relation yields an approximation of the minimum free-energy path for the Mueller-Brown potential in a single fit. Second, we prove that if the data lies on a manifold that can be approximated by the encoder, latent components beyond the manifold dimension are conditionally independent of the data distribution - carrying no additional information - and thus reveal the intrinsic dimension. Together, these results show that a single model can learn the data distribution and its intrinsic dimension with exact guarantees simultaneously, unifying two longstanding goals of unsupervised learning.
