Optimal Stopping in Latent Diffusion Models
Yu-Han Wu, Quentin Berthet, Gérard Biau, Claire Boyer, Romuald Elie, Pierre Marion
TL;DR
This work analyzes how latent dimensionality in Latent Diffusion Models interacts with the backward-diffusion stopping time to affect sample quality. Using a Gaussian model with a linear autoencoder, it derives how the Fréchet/Wasserstein-2 distance between the data and generated distributions evolves and shows a time-dependent trade-off where low latent dimensions benefit from earlier stopping and higher dimensions require later stopping. It further develops results for score-matching ERMs under norm constraints and extends the findings to general Gaussian covariances, illustrating that PCA-like projections are optimal on certain time intervals. The results offer a principled guideline for choosing latent dimension and stopping time to optimize generation quality while managing computation, supported by experiments on real data such as CelebA.
Abstract
We identify and analyze a surprising phenomenon of Latent Diffusion Models (LDMs) where the final steps of the diffusion can degrade sample quality. In contrast to conventional arguments that justify early stopping for numerical stability, this phenomenon is intrinsic to the dimensionality reduction in LDMs. We provide a principled explanation by analyzing the interaction between latent dimension and stopping time. Under a Gaussian framework with linear autoencoders, we characterize the conditions under which early stopping is needed to minimize the distance between generated and target distributions. More precisely, we show that lower-dimensional representations benefit from earlier termination, whereas higher-dimensional latent spaces require later stopping time. We further establish that the latent dimension interplays with other hyperparameters of the problem such as constraints in the parameters of score matching. Experiments on synthetic and real datasets illustrate these properties, underlining that early stopping can improve generative quality. Together, our results offer a theoretical foundation for understanding how the latent dimension influences the sample quality, and highlight stopping time as a key hyperparameter in LDMs.
