Latent Schr{ö}dinger Bridge Diffusion Model for Generative Learning
Yuling Jiao, Lican Kang, Huazhen Lin, Jin Liu, Heng Zuo
TL;DR
This work introduces a latent diffusion framework grounded in the Schrödinger bridge to enable principled generative learning with encoder–decoder pre-training. By separating pre-training from the latent diffusion process, the method jointly leverages distributional shifts and large-scale pretrained models, building a latent-space SDE that transports a Gaussian-convolved encoder distribution to the encoder target. The authors establish end-to-end Wasserstein-2 convergence rates and show these rates mitigate the curse of dimensionality intrinsic to raw data, achieving minimax-optimal rates under standard assumptions. The theory is developed for latent Schrödinger-bridge diffusion but is stated to generalize to broader diffusion models, underscoring both the practical feasibility and the theoretical rigor of latent diffusion with encoder–decoder pre-training.
Abstract
This paper aims to conduct a comprehensive theoretical analysis of current diffusion models. We introduce a novel generative learning methodology utilizing the Schr{ö}dinger bridge diffusion model in latent space as the framework for theoretical exploration in this domain. Our approach commences with the pre-training of an encoder-decoder architecture using data originating from a distribution that may diverge from the target distribution, thus facilitating the accommodation of a large sample size through the utilization of pre-existing large-scale models. Subsequently, we develop a diffusion model within the latent space utilizing the Schr{ö}dinger bridge framework. Our theoretical analysis encompasses the establishment of end-to-end error analysis for learning distributions via the latent Schr{ö}dinger bridge diffusion model. Specifically, we control the second-order Wasserstein distance between the generated distribution and the target distribution. Furthermore, our obtained convergence rates effectively mitigate the curse of dimensionality, offering robust theoretical support for prevailing diffusion models.
