Table of Contents
Fetching ...

Variational Schrödinger Momentum Diffusion

Kevin Rojas, Yixin Tan, Molei Tao, Yuriy Nevmyvaka, Wei Deng

TL;DR

This paper addresses the scalability gap in transport-enabled diffusion models by introducing Variational Schrödinger Momentum Diffusion (VSMD), a simulation-free framework that uses linearized variational forward scores and a damping transform to stabilize training. By formulating VSMD as an adaptive multivariate diffusion with velocityAugmented states, the authors derive forward–backward SDEs, establish a stochastic-approximation-based algorithm to adaptively optimize transport, and prove convergence properties. Empirical results demonstrate efficient, anisotropic generation, fast convergence, and competitive performance on time-series forecasting and CIFAR-10 image generation, while avoiding warm-up trajectories and complex denoising. The approach offers scalable OT-enabled generation with reduced reliance on forward simulations, opening practical applicability to real-world data and multimodal generation tasks.

Abstract

The momentum Schrödinger Bridge (mSB) has emerged as a leading method for accelerating generative diffusion processes and reducing transport costs. However, the lack of simulation-free properties inevitably results in high training costs and affects scalability. To obtain a trade-off between transport properties and scalability, we introduce variational Schrödinger momentum diffusion (VSMD), which employs linearized forward score functions (variational scores) to eliminate the dependence on simulated forward trajectories. Our approach leverages a multivariate diffusion process with adaptively transport-optimized variational scores. Additionally, we apply a critical-damping transform to stabilize training by removing the need for score estimations for both velocity and samples. Theoretically, we prove the convergence of samples generated with optimal variational scores and momentum diffusion. Empirical results demonstrate that VSMD efficiently generates anisotropic shapes while maintaining transport efficacy, outperforming overdamped alternatives, and avoiding complex denoising processes. Our approach also scales effectively to real-world data, achieving competitive results in time series and image generation.

Variational Schrödinger Momentum Diffusion

TL;DR

This paper addresses the scalability gap in transport-enabled diffusion models by introducing Variational Schrödinger Momentum Diffusion (VSMD), a simulation-free framework that uses linearized variational forward scores and a damping transform to stabilize training. By formulating VSMD as an adaptive multivariate diffusion with velocityAugmented states, the authors derive forward–backward SDEs, establish a stochastic-approximation-based algorithm to adaptively optimize transport, and prove convergence properties. Empirical results demonstrate efficient, anisotropic generation, fast convergence, and competitive performance on time-series forecasting and CIFAR-10 image generation, while avoiding warm-up trajectories and complex denoising. The approach offers scalable OT-enabled generation with reduced reliance on forward simulations, opening practical applicability to real-world data and multimodal generation tasks.

Abstract

The momentum Schrödinger Bridge (mSB) has emerged as a leading method for accelerating generative diffusion processes and reducing transport costs. However, the lack of simulation-free properties inevitably results in high training costs and affects scalability. To obtain a trade-off between transport properties and scalability, we introduce variational Schrödinger momentum diffusion (VSMD), which employs linearized forward score functions (variational scores) to eliminate the dependence on simulated forward trajectories. Our approach leverages a multivariate diffusion process with adaptively transport-optimized variational scores. Additionally, we apply a critical-damping transform to stabilize training by removing the need for score estimations for both velocity and samples. Theoretically, we prove the convergence of samples generated with optimal variational scores and momentum diffusion. Empirical results demonstrate that VSMD efficiently generates anisotropic shapes while maintaining transport efficacy, outperforming overdamped alternatives, and avoiding complex denoising processes. Our approach also scales effectively to real-world data, achieving competitive results in time series and image generation.

Paper Structure

This paper contains 38 sections, 6 theorems, 43 equations, 19 figures, 3 tables.

Key Result

Proposition 1

Given $\beta, \gamma>0$, the stochastic representation of the solution follows where $\overrightarrow y_t = \log \overrightarrow\psi({\bm{\mathrm{a}}}_t, t)$ and $\overleftarrow y_t=\log \overleftarrow\varphi({\bm{\mathrm{a}}}_t, t)$, ${\overrightarrow{\bf z}_t =\sqrt{\beta} \nabla_{{\bm{\mathrm{v}}}} \overrightarrow y_t}$, ${\overleftarrow {\bf z}_t =\sqrt{\beta} \nabla_{{\bm

Figures (19)

  • Figure 1: Comparison with existing methodologies and algorithm properties.
  • Figure 2: CLD-5 (left two) v.s. VSULD-5 (right two) on spiral-8Y and checkerboard-6X.
  • Figure 3: Sample quality evaluation. The damping ratios for ULD and VSULD are both fixed to 0.7.
  • Figure 4: Straightness metric of probability flow ODEs on the non-stretched dimension via CLD, ULD, VSCLD, and VSULD.
  • Figure 5: Overdamped versus underdamped models.
  • ...and 14 more figures

Theorems & Definitions (9)

  • Proposition 1: Feynman-Kac formula
  • Lemma 1: Invariant Measure
  • proof
  • Theorem 1: Fixed Generation Quality
  • proof
  • Remark 1
  • Lemma 2: Local stabiltity
  • Theorem 2: Convergence in $L^2$
  • Theorem 3