Generative Modeling with Phase Stochastic Bridges
Tianrong Chen, Jiatao Gu, Laurent Dinh, Evangelos A. Theodorou, Joshua Susskind, Shuangfei Zhai
TL;DR
The paper addresses the sampling inefficiency of diffusion models by introducing phase-space dynamics that incorporate velocity, formulating Acceleration Generative Modeling (AGM) within a Stochastic Optimal Control framework. It derives an analytical phase-space Brownian-bridge-like solution with an optimal acceleration $a^*(\mathbf{m}_t,t) = g_t^2 P_{11}\left(\frac{\mathbf{x}_1-\mathbf{x}_t}{1-t}-\mathbf{v}_t\right)$, yielding AGM-SDE and AGM-ODE and a trainable force term $F_t^{\theta}$. A key innovation is sampling-hop, which enables early estimation of the target data point $\mathbf{x}_1$ using both state and velocity, improving efficiency especially at low NFEs and enabling conditional generation. Across toy and image datasets (CIFAR-10, AFHQv2, ImageNet-64), AGM demonstrates competitive or superior performance in low-NFE regimes and offers a new pathway for fast, velocity-informed generative modeling. The approach opens opportunities to integrate momentum information into diffusion-like models, potentially reducing computational cost for high-quality sample generation.
Abstract
Diffusion models (DMs) represent state-of-the-art generative models for continuous inputs. DMs work by constructing a Stochastic Differential Equation (SDE) in the input space (ie, position space), and using a neural network to reverse it. In this work, we introduce a novel generative modeling framework grounded in \textbf{phase space dynamics}, where a phase space is defined as {an augmented space encompassing both position and velocity.} Leveraging insights from Stochastic Optimal Control, we construct a path measure in the phase space that enables efficient sampling. {In contrast to DMs, our framework demonstrates the capability to generate realistic data points at an early stage of dynamics propagation.} This early prediction sets the stage for efficient data generation by leveraging additional velocity information along the trajectory. On standard image generation benchmarks, our model yields favorable performance over baselines in the regime of small Number of Function Evaluations (NFEs). Furthermore, our approach rivals the performance of diffusion models equipped with efficient sampling techniques, underscoring its potential as a new tool generative modeling.
