Table of Contents
Fetching ...

Generative Modeling with Phase Stochastic Bridges

Tianrong Chen, Jiatao Gu, Laurent Dinh, Evangelos A. Theodorou, Joshua Susskind, Shuangfei Zhai

TL;DR

The paper addresses the sampling inefficiency of diffusion models by introducing phase-space dynamics that incorporate velocity, formulating Acceleration Generative Modeling (AGM) within a Stochastic Optimal Control framework. It derives an analytical phase-space Brownian-bridge-like solution with an optimal acceleration $a^*(\mathbf{m}_t,t) = g_t^2 P_{11}\left(\frac{\mathbf{x}_1-\mathbf{x}_t}{1-t}-\mathbf{v}_t\right)$, yielding AGM-SDE and AGM-ODE and a trainable force term $F_t^{\theta}$. A key innovation is sampling-hop, which enables early estimation of the target data point $\mathbf{x}_1$ using both state and velocity, improving efficiency especially at low NFEs and enabling conditional generation. Across toy and image datasets (CIFAR-10, AFHQv2, ImageNet-64), AGM demonstrates competitive or superior performance in low-NFE regimes and offers a new pathway for fast, velocity-informed generative modeling. The approach opens opportunities to integrate momentum information into diffusion-like models, potentially reducing computational cost for high-quality sample generation.

Abstract

Diffusion models (DMs) represent state-of-the-art generative models for continuous inputs. DMs work by constructing a Stochastic Differential Equation (SDE) in the input space (ie, position space), and using a neural network to reverse it. In this work, we introduce a novel generative modeling framework grounded in \textbf{phase space dynamics}, where a phase space is defined as {an augmented space encompassing both position and velocity.} Leveraging insights from Stochastic Optimal Control, we construct a path measure in the phase space that enables efficient sampling. {In contrast to DMs, our framework demonstrates the capability to generate realistic data points at an early stage of dynamics propagation.} This early prediction sets the stage for efficient data generation by leveraging additional velocity information along the trajectory. On standard image generation benchmarks, our model yields favorable performance over baselines in the regime of small Number of Function Evaluations (NFEs). Furthermore, our approach rivals the performance of diffusion models equipped with efficient sampling techniques, underscoring its potential as a new tool generative modeling.

Generative Modeling with Phase Stochastic Bridges

TL;DR

The paper addresses the sampling inefficiency of diffusion models by introducing phase-space dynamics that incorporate velocity, formulating Acceleration Generative Modeling (AGM) within a Stochastic Optimal Control framework. It derives an analytical phase-space Brownian-bridge-like solution with an optimal acceleration , yielding AGM-SDE and AGM-ODE and a trainable force term . A key innovation is sampling-hop, which enables early estimation of the target data point using both state and velocity, improving efficiency especially at low NFEs and enabling conditional generation. Across toy and image datasets (CIFAR-10, AFHQv2, ImageNet-64), AGM demonstrates competitive or superior performance in low-NFE regimes and offers a new pathway for fast, velocity-informed generative modeling. The approach opens opportunities to integrate momentum information into diffusion-like models, potentially reducing computational cost for high-quality sample generation.

Abstract

Diffusion models (DMs) represent state-of-the-art generative models for continuous inputs. DMs work by constructing a Stochastic Differential Equation (SDE) in the input space (ie, position space), and using a neural network to reverse it. In this work, we introduce a novel generative modeling framework grounded in \textbf{phase space dynamics}, where a phase space is defined as {an augmented space encompassing both position and velocity.} Leveraging insights from Stochastic Optimal Control, we construct a path measure in the phase space that enables efficient sampling. {In contrast to DMs, our framework demonstrates the capability to generate realistic data points at an early stage of dynamics propagation.} This early prediction sets the stage for efficient data generation by leveraging additional velocity information along the trajectory. On standard image generation benchmarks, our model yields favorable performance over baselines in the regime of small Number of Function Evaluations (NFEs). Furthermore, our approach rivals the performance of diffusion models equipped with efficient sampling techniques, underscoring its potential as a new tool generative modeling.
Paper Structure (40 sections, 7 theorems, 77 equations, 21 figures, 5 tables, 2 algorithms)

This paper contains 40 sections, 7 theorems, 77 equations, 21 figures, 5 tables, 2 algorithms.

Key Result

Proposition 3

When ${\mathbf{r}}\rightarrow +\infty$, The solution w.r.t optimization problem eq:stochastic-bridge is,

Figures (21)

  • Figure 1: The pixel-wise trajectories comparison with CLDdockhorn2021score. Left figures correspond to the trajectories over time w.r.t random sampled 16 pixels, for position and velocity. Our model is able to learn straighter trajectories which is beneficial for reducing sampling complexity.
  • Figure 2: Data estimation comparison with EDM karras2022elucidating. When the network is endowed with supplementary velocity, AGM gains the capacity to estimate the target data point during the early stages of the trajectory. One can use estimated image $\tilde{{\mathbf{x}}}_1$ at $t_{i}<t_N$ as generated results and allocated more NFE between time $[0,t_{i}]$ which results to smaller discretization error.
  • Figure 3: The standard deviaton $\sigma$ of the terminal marginal for uncontrolled dynamics. We empirically selected the hyperparameter $k=-0.2$. This choice induces a terminal marginal distribution with $\sigma$ that covers the data range with uncontrolled dynamics.
  • Figure 4: Comparison with EDM karras2022elucidating on AFHQv2 dataset. AGM-ODE exhibits superior generative performance when NFE is exceedingly low, owing to its unique dynamics architecture that incorporates velocity when predicting the estimated data point.
  • Figure 5: We showcase that AGM can generate conditional results from an unconditional model by injecting the conditional information into the velocity ${\mathbf{v}}_0$, thus leading to new initial velocity ${\mathbf{v}}_0^{cond}$.
  • ...and 16 more figures

Theorems & Definitions (22)

  • Remark 1
  • Definition 2: Stochastic Bridge problem of linear momentum system chen2015stochastic
  • Proposition 3: Phase Space Brownian Bridge
  • proof
  • Remark 4
  • Proposition 5: Sampling-Hop
  • proof
  • Lemma 6
  • proof
  • Remark 7
  • ...and 12 more