Deep Generative Models through the Lens of the Manifold Hypothesis: A Survey and New Connections
Gabriel Loaiza-Ganem, Brendan Leigh Ross, Rasa Hosseinzadeh, Anthony L. Caterini, Jesse C. Cresswell
TL;DR
The surveyed work reframes deep generative modeling through the manifold hypothesis, arguing that data often lie on unknown low-dimensional manifolds within high-dimensional spaces. It identifies fundamental limitations of likelihood-based DGMs (e.g., VAEs, NFs, EBMs) when data are manifold-supported, notably the unavoidable numerical instability and manifold overfitting, while showing that diffusion-based methods and latent representations can effectively capture manifold structure by implicitly minimizing Wasserstein distances. The authors unify several manifold-aware strategies—adding noise, weak-convergence losses (Wasserstein/MMD), and two-step latent modeling—under a common lens, and they connect two-step approaches to optimal transport theory. They also discuss topological challenges and present tools (neural implicit manifolds, multi-chart manifolds, injective flows) to address nontrivial topologies. Collectively, the work provides a theory-grounded roadmap for designing DGMs that respect manifold structure, with practical implications for diffusion models, latent diffusion, and topology-aware generative modeling.
Abstract
In recent years there has been increased interest in understanding the interplay between deep generative models (DGMs) and the manifold hypothesis. Research in this area focuses on understanding the reasons why commonly-used DGMs succeed or fail at learning distributions supported on unknown low-dimensional manifolds, as well as developing new models explicitly designed to account for manifold-supported data. This manifold lens provides both clarity as to why some DGMs (e.g. diffusion models and some generative adversarial networks) empirically surpass others (e.g. likelihood-based models such as variational autoencoders, normalizing flows, or energy-based models) at sample generation, and guidance for devising more performant DGMs. We carry out the first survey of DGMs viewed through this lens, making two novel contributions along the way. First, we formally establish that numerical instability of likelihoods in high ambient dimensions is unavoidable when modelling data with low intrinsic dimension. We then show that DGMs on learned representations of autoencoders can be interpreted as approximately minimizing Wasserstein distance: this result, which applies to latent diffusion models, helps justify their outstanding empirical results. The manifold lens provides a rich perspective from which to understand DGMs, and we aim to make this perspective more accessible and widespread.
