Deep Generative Models through the Lens of the Manifold Hypothesis: A Survey and New Connections

Gabriel Loaiza-Ganem; Brendan Leigh Ross; Rasa Hosseinzadeh; Anthony L. Caterini; Jesse C. Cresswell

Deep Generative Models through the Lens of the Manifold Hypothesis: A Survey and New Connections

Gabriel Loaiza-Ganem, Brendan Leigh Ross, Rasa Hosseinzadeh, Anthony L. Caterini, Jesse C. Cresswell

TL;DR

The surveyed work reframes deep generative modeling through the manifold hypothesis, arguing that data often lie on unknown low-dimensional manifolds within high-dimensional spaces. It identifies fundamental limitations of likelihood-based DGMs (e.g., VAEs, NFs, EBMs) when data are manifold-supported, notably the unavoidable numerical instability and manifold overfitting, while showing that diffusion-based methods and latent representations can effectively capture manifold structure by implicitly minimizing Wasserstein distances. The authors unify several manifold-aware strategies—adding noise, weak-convergence losses (Wasserstein/MMD), and two-step latent modeling—under a common lens, and they connect two-step approaches to optimal transport theory. They also discuss topological challenges and present tools (neural implicit manifolds, multi-chart manifolds, injective flows) to address nontrivial topologies. Collectively, the work provides a theory-grounded roadmap for designing DGMs that respect manifold structure, with practical implications for diffusion models, latent diffusion, and topology-aware generative modeling.

Abstract

In recent years there has been increased interest in understanding the interplay between deep generative models (DGMs) and the manifold hypothesis. Research in this area focuses on understanding the reasons why commonly-used DGMs succeed or fail at learning distributions supported on unknown low-dimensional manifolds, as well as developing new models explicitly designed to account for manifold-supported data. This manifold lens provides both clarity as to why some DGMs (e.g. diffusion models and some generative adversarial networks) empirically surpass others (e.g. likelihood-based models such as variational autoencoders, normalizing flows, or energy-based models) at sample generation, and guidance for devising more performant DGMs. We carry out the first survey of DGMs viewed through this lens, making two novel contributions along the way. First, we formally establish that numerical instability of likelihoods in high ambient dimensions is unavoidable when modelling data with low intrinsic dimension. We then show that DGMs on learned representations of autoencoders can be interpreted as approximately minimizing Wasserstein distance: this result, which applies to latent diffusion models, helps justify their outstanding empirical results. The manifold lens provides a rich perspective from which to understand DGMs, and we aim to make this perspective more accessible and widespread.

Deep Generative Models through the Lens of the Manifold Hypothesis: A Survey and New Connections

TL;DR

Abstract

Paper Structure (74 sections, 8 theorems, 110 equations, 11 figures, 1 table)

This paper contains 74 sections, 8 theorems, 110 equations, 11 figures, 1 table.

Introduction
Notation and Setup
Notation
Ambient and latent spaces
Encoders and decoders
Probability
Network parameters
Calculus
Linear algebra
Setup
Background
Deep Generative Models on Known Manifolds
Manifold Learning
When are perfect reconstructions achievable?
The Change-of-Variables Formula
...and 59 more sections

Key Result

theorem 1

Let $M \subset \mathcal{X}$ be a Borel set such that $\lambda_D(\mathop{\mathrm{cl}}\nolimits_\mathcal{X}(M))=0$, and let $\mathbb{P}^X_\dagger$ be a probability measure on $\mathcal{X}$ such that $\mathbb{P}^X_\dagger(M)=1$ and $\mathop{\mathrm{supp}}\nolimits(\mathbb{P}^X_\dagger) = \mathop{\mathr

Figures (11)

Figure 1: (a) Depiction of a full-dimensional density (i.e. a density in the "usual sense") $P^X$ on $\mathbb{R}^D$, with $D=2$. The probability assigned by $P^X$ to a region $A$ of $\mathbb{R}^D$ is its integral over the region, i.e. $\iint_A P^X(x_1, x_2) {\textnormal{d}} x_1 {\textnormal{d}} x_2$. (b) When the density $P^X_{\ast}$ is instead supported on a $d^\ast$-dimensional manifold $\mathcal{M}$ embedded in $\mathbb{R}^D$ (here, $d^\ast=1$ and $\mathcal{M}$ is a curve), the integral evaluates to zero, and thus $P^X_{\ast}$ is not a density in the "usual sense". (c) Formally, in order to recover the probability assigned to $A$ by the manifold-supported density $P^X_{\ast}$, the density must be integrated only over $\mathcal{M}$ using a volume form (${\textnormal{d}} \textrm{vol}_\mathcal{M}$) on the manifold, $\int_{A \cap \mathcal{M}} P^X_{\ast} {\textnormal{d}} \textrm{vol}_\mathcal{M}$ -- which in this case simply corresponds to a line integral.
Figure 2: Illustration of why autoencoders, by themselves, do not characterize $\mathcal{M}$ even if they achieve perfect reconstructions on it. The illustrative point $z \in \mathcal{Z} \setminus f_{\phi^\ast}(\mathcal{M})$ is such that $x=g_{\theta^\ast}(z) \notin \mathcal{M}$, so that the set of possible decoder outputs does not match $\mathcal{M}$, i.e. $g_{\theta^\ast}(\mathcal{Z}) \neq \mathcal{M}$ -- even though $\mathcal{M}$ is contained in $g_{\theta^\ast}(\mathcal{Z})$ due to the assumption of perfect reconstructions. Additionally, in this example $x \in \mathcal{X} \setminus \mathcal{M}$ is perfectly reconstructed, so that the set of perfectly reconstructed points does not match $\mathcal{M}$, i.e. $\{x \in \mathcal{X} \mid x = g_{\theta^\ast}(f_{\phi^\ast}(x))\} \neq \mathcal{M}$ -- even though $\mathcal{M}$ is contained in this set whenever $x = g_{\theta^\ast}(f_{\phi^\ast}(x))$ for every $x \in \mathcal{M}$.
Figure 3: Illustration of why KL divergences can be infinite in the manifold setting. (a)$P^X_\theta$ has full-dimensional support (light red region), while $P^X_{\ast}$ is supported on a lower-dimensional manifold $\mathcal{M}$ (blue curve). The model $P^X_\theta$ assigns probability $0$ to $A$, i.e. $\int_{A} P^X_\theta {\textnormal{d}} x = 0$, because the region $A$ has zero volume in $\mathcal{X}$. However, $P^X_{\ast}$ does not, since $\int_{A\cap\mathcal{M}} P^X_{\ast} {\textnormal{d}} \textrm{vol}_\mathcal{M} > 0$. We conclude that $\mathbb{KL} \left(P^X_{\ast} \, \Vert \, P^X_\theta \right) = \infty$. Meanwhile, $\int_{B\cap\mathcal{M}} P^X_{\ast} {\textnormal{d}} \textrm{vol}_\mathcal{M} = 0$ because $B\cap\mathcal{M}=\emptyset$, yet we have $\int_B P^X_\theta {\textnormal{d}} x > 0$, entailing that $\mathbb{KL} \left(P^X_\theta \, \Vert \, P^X_{\ast} \right) = \infty$. (b) Analogous example where now $P^X_\theta$ and $P^X_{\ast}$ are both supported on low-dimensional manifolds. Since $\mathcal{M}$ is not contained in the support of $P^X_\theta$, there exists a set $A$ to which $P^X_\theta$ assigns probability $0$ despite having positive probability under $P^X_{\ast}$, so that $\mathbb{KL} \left(P^X_{\ast} \, \Vert \, P^X_\theta \right) = \infty$.
Figure 4: The optimal transport problem can be visualized as the minimum cost of "transporting" the density $p$ over to the density $q$. Picturing $p$ and $q$ as piles of dirt, each dirt particle from $p$ must be moved so that it becomes part of $q$. Moving dirt from $x$ to $y$ incurs a cost given by $c(x,y)$. The joint distribution $\gamma$ of $(X,Y)$ can be thought of as specifying the "transport plan": the constraint that its $X$-marginal matches $p$ ensures the starting pile of dirt is $p$; the constraint that its $Y$-marginal matches $q$ ensures the final pile of dirt is $q$; and its $(Y|X=x)$-conditional -- illustrated with the black arrows in the figure -- specifies how the dirt at $x$ from $p$ is (potentially stochastically) allocated to dirt from $q$. The most efficient plan possible for shifting all the dirt has an overall cost $\mathbb{W}^c(p, q)$. This analogy explains why the Wasserstein-1 distance is sometimes called the earth mover's distance.
Figure 5: Illustration of manifold overfitting, where the $1$-dimensional $P^X_{\ast}$ (shades of blue) along a curve $\mathcal{M}$ in $2$-dimensional ambient space is improperly approximated. Each row shows a sequence of full-dimensional densities $P^X_{\theta_t}$ (red surfaces) having the property that their likelihood diverges to infinity on all of $\mathcal{M}$, yet each sequence approximates a different manifold-supported density $p_\dagger^X$ on $\mathcal{M}$: the top sequence will recover a bimodal distribution on $\mathcal{M}$ and the bottom sequence a trimodal one, despite $P^X_{\ast}$ being unimodal.
...and 6 more figures

Theorems & Definitions (16)

theorem 1: Likelihood Instability of Deep Generative Models
proof
Proposition 1
proof
Definition 1: Continuity Set
Lemma 1: Portmanteau
theorem 1: Likelihood Instability of Deep Generative Models
Lemma 2
proof
Lemma 3
...and 6 more

Deep Generative Models through the Lens of the Manifold Hypothesis: A Survey and New Connections

TL;DR

Abstract

Deep Generative Models through the Lens of the Manifold Hypothesis: A Survey and New Connections

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (16)