An Ordinary Differential Equation Sampler with Stochastic Start for Diffusion Bridge Models
Yuang Wang, Pengfei Jin, Li Zhang, Quanzheng Li, Zhiqiang Chen, Dufan Wu
TL;DR
Diffusion bridge models initialize from corrupted images yet typically rely on SDE samplers, which can slow conditional generation. The authors propose a high-order ODE sampler with a stochastic start (ODES3) that uses posterior sampling to bypass the PF-ODE start singularity, followed by Heun's second-order integration to rapidly solve the PF-ODE, achieving high perceptual quality with fewer neural function evaluations. The method is training-free and compatible with pretrained diffusion-bridge models, with extensive experiments on image restoration and translation showing state-of-the-art FID and visual quality improvements. This work provides a practical route to faster, more accurate conditional generation in diffusion-bridge setups and motivates exploring additional high-order ODE solvers in future research.
Abstract
Diffusion bridge models have demonstrated promising performance in conditional image generation tasks, such as image restoration and translation, by initializing the generative process from corrupted images instead of pure Gaussian noise. However, existing diffusion bridge models often rely on Stochastic Differential Equation (SDE) samplers, which result in slower inference speed compared to diffusion models that employ high-order Ordinary Differential Equation (ODE) solvers for acceleration. To mitigate this gap, we propose a high-order ODE sampler with a stochastic start for diffusion bridge models. To overcome the singular behavior of the probability flow ODE (PF-ODE) at the beginning of the reverse process, a posterior sampling approach was introduced at the first reverse step. The sampling was designed to ensure a smooth transition from corrupted images to the generative trajectory while reducing discretization errors. Following this stochastic start, Heun's second-order solver is applied to solve the PF-ODE, achieving high perceptual quality with significantly reduced neural function evaluations (NFEs). Our method is fully compatible with pretrained diffusion bridge models and requires no additional training. Extensive experiments on image restoration and translation tasks, including super-resolution, JPEG restoration, Edges-to-Handbags, and DIODE-Outdoor, demonstrated that our sampler outperforms state-of-the-art methods in both visual quality and Frechet Inception Distance (FID).
