Table of Contents
Fetching ...

Latent Schr{ö}dinger Bridge Diffusion Model for Generative Learning

Yuling Jiao, Lican Kang, Huazhen Lin, Jin Liu, Heng Zuo

TL;DR

This work introduces a latent diffusion framework grounded in the Schrödinger bridge to enable principled generative learning with encoder–decoder pre-training. By separating pre-training from the latent diffusion process, the method jointly leverages distributional shifts and large-scale pretrained models, building a latent-space SDE that transports a Gaussian-convolved encoder distribution to the encoder target. The authors establish end-to-end Wasserstein-2 convergence rates and show these rates mitigate the curse of dimensionality intrinsic to raw data, achieving minimax-optimal rates under standard assumptions. The theory is developed for latent Schrödinger-bridge diffusion but is stated to generalize to broader diffusion models, underscoring both the practical feasibility and the theoretical rigor of latent diffusion with encoder–decoder pre-training.

Abstract

This paper aims to conduct a comprehensive theoretical analysis of current diffusion models. We introduce a novel generative learning methodology utilizing the Schr{ö}dinger bridge diffusion model in latent space as the framework for theoretical exploration in this domain. Our approach commences with the pre-training of an encoder-decoder architecture using data originating from a distribution that may diverge from the target distribution, thus facilitating the accommodation of a large sample size through the utilization of pre-existing large-scale models. Subsequently, we develop a diffusion model within the latent space utilizing the Schr{ö}dinger bridge framework. Our theoretical analysis encompasses the establishment of end-to-end error analysis for learning distributions via the latent Schr{ö}dinger bridge diffusion model. Specifically, we control the second-order Wasserstein distance between the generated distribution and the target distribution. Furthermore, our obtained convergence rates effectively mitigate the curse of dimensionality, offering robust theoretical support for prevailing diffusion models.

Latent Schr{ö}dinger Bridge Diffusion Model for Generative Learning

TL;DR

This work introduces a latent diffusion framework grounded in the Schrödinger bridge to enable principled generative learning with encoder–decoder pre-training. By separating pre-training from the latent diffusion process, the method jointly leverages distributional shifts and large-scale pretrained models, building a latent-space SDE that transports a Gaussian-convolved encoder distribution to the encoder target. The authors establish end-to-end Wasserstein-2 convergence rates and show these rates mitigate the curse of dimensionality intrinsic to raw data, achieving minimax-optimal rates under standard assumptions. The theory is developed for latent Schrödinger-bridge diffusion but is stated to generalize to broader diffusion models, underscoring both the practical feasibility and the theoretical rigor of latent diffusion with encoder–decoder pre-training.

Abstract

This paper aims to conduct a comprehensive theoretical analysis of current diffusion models. We introduce a novel generative learning methodology utilizing the Schr{ö}dinger bridge diffusion model in latent space as the framework for theoretical exploration in this domain. Our approach commences with the pre-training of an encoder-decoder architecture using data originating from a distribution that may diverge from the target distribution, thus facilitating the accommodation of a large sample size through the utilization of pre-existing large-scale models. Subsequently, we develop a diffusion model within the latent space utilizing the Schr{ö}dinger bridge framework. Our theoretical analysis encompasses the establishment of end-to-end error analysis for learning distributions via the latent Schr{ö}dinger bridge diffusion model. Specifically, we control the second-order Wasserstein distance between the generated distribution and the target distribution. Furthermore, our obtained convergence rates effectively mitigate the curse of dimensionality, offering robust theoretical support for prevailing diffusion models.
Paper Structure (42 sections, 32 theorems, 423 equations, 1 algorithm)

This paper contains 42 sections, 32 theorems, 423 equations, 1 algorithm.

Key Result

Proposition 2.1

leonard2014survey Let $\mathscr{L}$ be the Lebesgue measure. If $\nu, \mu \ll \mathscr{L}$, then SBP admits a unique solution $\mathbf{Q}^* = f^*(Z_0)g^*(Z_1)\mathbf{P}$, where $f^*$ and $g^*$ are $\mathscr{L}$-measurable nonnegative functions satisfying the Schrödinger system Furthermore, the pair $(\mathbf{Q}^*_{t},\mathbf{v}^*_{t})$ with solves the minimum action problem s.t.

Theorems & Definitions (62)

  • Proposition 2.1
  • Proposition 2.2
  • Definition 2.1: ReLU FNNs
  • Definition 2.2: Wasserstein distance
  • Definition 2.3: Covering number
  • Definition 2.4: ($\beta$, $R$)-Hölder Class
  • Remark 4.1
  • Lemma 4.1
  • Theorem 4.2
  • Lemma 4.3: Approximation Error
  • ...and 52 more