Table of Contents
Fetching ...

Distributional Autoencoders Know the Score

Andrej Leban

TL;DR

The paper presents the Distributional Principal Autoencoder (DPA), which simultaneously learns a data distribution and a nonlinear, PCA-like manifold representation with an exact score–geometry identity linking encoder level-set normals to the data score $s_{ ext{data}}(y)=\nabla_y \log P_{ ext{data}}(y)$. It proves that, when data lie on a parameterizable manifold, extraneous latent coordinates beyond the intrinsic dimension are uninformative given the informative coordinates, enabling intrinsic-dimension discovery via conditional independence. The authors validate these results with experiments on Normal, Gaussian mixtures, and the Müller–Brown potential, showing close alignment between level-set normals and the data score and demonstrating MFEP recovery from a single, unsupervised fit. This work unifies distribution learning and dimensionality reduction under exact guarantees, with practical implications for efficient molecular simulations and nonlinear manifold learning.

Abstract

The Distributional Principal Autoencoder (DPA) combines distributionally correct reconstruction with principal-component-like interpretability of the encodings. In this work, we provide exact theoretical guarantees on both fronts. First, we derive a closed-form relation linking each optimal level-set geometry to the data-distribution score. This result explains DPA's empirical ability to disentangle factors of variation of the data, as well as allows the score to be recovered directly from samples. When the data follows the Boltzmann distribution, we demonstrate that this relation yields an approximation of the minimum free-energy path for the Mueller-Brown potential in a single fit. Second, we prove that if the data lies on a manifold that can be approximated by the encoder, latent components beyond the manifold dimension are conditionally independent of the data distribution - carrying no additional information - and thus reveal the intrinsic dimension. Together, these results show that a single model can learn the data distribution and its intrinsic dimension with exact guarantees simultaneously, unifying two longstanding goals of unsupervised learning.

Distributional Autoencoders Know the Score

TL;DR

The paper presents the Distributional Principal Autoencoder (DPA), which simultaneously learns a data distribution and a nonlinear, PCA-like manifold representation with an exact score–geometry identity linking encoder level-set normals to the data score . It proves that, when data lie on a parameterizable manifold, extraneous latent coordinates beyond the intrinsic dimension are uninformative given the informative coordinates, enabling intrinsic-dimension discovery via conditional independence. The authors validate these results with experiments on Normal, Gaussian mixtures, and the Müller–Brown potential, showing close alignment between level-set normals and the data score and demonstrating MFEP recovery from a single, unsupervised fit. This work unifies distribution learning and dimensionality reduction under exact guarantees, with practical implications for efficient molecular simulations and nonlinear manifold learning.

Abstract

The Distributional Principal Autoencoder (DPA) combines distributionally correct reconstruction with principal-component-like interpretability of the encodings. In this work, we provide exact theoretical guarantees on both fronts. First, we derive a closed-form relation linking each optimal level-set geometry to the data-distribution score. This result explains DPA's empirical ability to disentangle factors of variation of the data, as well as allows the score to be recovered directly from samples. When the data follows the Boltzmann distribution, we demonstrate that this relation yields an approximation of the minimum free-energy path for the Mueller-Brown potential in a single fit. Second, we prove that if the data lies on a manifold that can be approximated by the encoder, latent components beyond the manifold dimension are conditionally independent of the data distribution - carrying no additional information - and thus reveal the intrinsic dimension. Together, these results show that a single model can learn the data distribution and its intrinsic dimension with exact guarantees simultaneously, unifying two longstanding goals of unsupervised learning.

Paper Structure

This paper contains 44 sections, 8 theorems, 144 equations, 7 figures, 6 tables.

Key Result

Lemma 2.5

For any $\beta > 0$, assume Assumptions manuscript:asmptn:global and define: where $\eta$ is a perturbation (function), and $\delta$ is the Dirac delta distribution. Next, define the level-set mass: Then, for almost every sample $X\sim P_{\mathrm{data}}$ whose level set $\mathcal{L}_{e^*(X)}$ satisfies Assumption manuscript:asmptn:local_rank, and any $\eta$ in the same function class as $e$, we

Figures (7)

  • Figure 1: Gaussian examples.a) standard Normal; b) Gaussian mixture. Red contours: data density; black arrows: score. Left: first latent; right: second.
  • Figure 2: Müller–Brown potential: encoder level sets and comparisons. Red contours: potential; black arrows: potential gradient; purple: the MFEP. (a) First (left) and second (right) DPA components. (b) First component for an Autoencoder (left) and a VAE (right).
  • Figure 3: The data for the Gaussian examples.
  • Figure 4: The data for the Müller-Brown potential example. Note the dearth of samples in between the potential minima.
  • Figure 5: Signed score alignment: sign flips due to the inaccuracy of estimating the level-set statistics.
  • ...and 2 more figures

Theorems & Definitions (13)

  • Definition 2.1: Oracle reconstructed distribution -- ORD, Definition 1 in shen_distributional_2024
  • Definition 2.2: DPA encoder optimization objective, Eq. 4 in shen_distributional_2024
  • Lemma 2.5: General integral balance for an optimal encoder
  • Theorem 2.6: When $\beta=2$, the optimal encoder's level sets align with the data score
  • Corollary 2.7: Consequence of Theorem \ref{['manuscript:thm:grad_level_set']} for extrema of the data distribution
  • Definition 3.1: $K^\prime$-parameterizable manifold
  • Proposition 3.2: An exactly parameterizable manifold is $K^\prime$-parameterizable
  • Proposition 3.2: An exactly parameterizable manifold is $K^\prime$-parameterizable
  • Definition 3.3: $K^\prime$-best-approximating encoder
  • Theorem 3.4: Extraneous latents of a $K^\prime$-best‑approximating encoder are uninformative
  • ...and 3 more