Table of Contents
Fetching ...

Bidirectional Diffusion Bridge Models

Duc Kieu, Kien Do, Toan Nguyen, Dang Nguyen, Thin Nguyen

TL;DR

This work introduces Bidirectional Diffusion Bridge Model (BDBM), a single-network framework that enables bidirectional translation between two coupled data distributions by exploiting the Chapman-Kolmogorov equation for diffusion bridges. By sharing a noise predictor across forward and backward directions and using a binary mask to switch modes, BDBM efficiently learns both directions without duplicating models. The method derives tractable forward and backward transitions under Gaussian marginals, connects to Doob's h-transform and variational perspectives, and demonstrates strong performance on four high-resolution paired I2I datasets in both pixel and latent spaces, outperforming state-of-the-art unidirectional and bidirectional baselines. These results indicate substantial reductions in training cost and improved sample quality and diversity, with broad potential for extending bidirectional diffusion bridging to additional domains and multimodal tasks.

Abstract

Diffusion bridges have shown potential in paired image-to-image (I2I) translation tasks. However, existing methods are limited by their unidirectional nature, requiring separate models for forward and reverse translations. This not only doubles the computational cost but also restricts their practicality. In this work, we introduce the Bidirectional Diffusion Bridge Model (BDBM), a scalable approach that facilitates bidirectional translation between two coupled distributions using a single network. BDBM leverages the Chapman-Kolmogorov Equation for bridges, enabling it to model data distribution shifts across timesteps in both forward and backward directions by exploiting the interchangeability of the initial and target timesteps within this framework. Notably, when the marginal distribution given endpoints is Gaussian, BDBM's transition kernels in both directions possess analytical forms, allowing for efficient learning with a single network. We demonstrate the connection between BDBM and existing bridge methods, such as Doob's h-transform and variational approaches, and highlight its advantages. Extensive experiments on high-resolution I2I translation tasks demonstrate that BDBM not only enables bidirectional translation with minimal additional cost but also outperforms state-of-the-art bridge models. Our source code is available at [https://github.com/kvmduc/BDBM||https://github.com/kvmduc/BDBM].

Bidirectional Diffusion Bridge Models

TL;DR

This work introduces Bidirectional Diffusion Bridge Model (BDBM), a single-network framework that enables bidirectional translation between two coupled data distributions by exploiting the Chapman-Kolmogorov equation for diffusion bridges. By sharing a noise predictor across forward and backward directions and using a binary mask to switch modes, BDBM efficiently learns both directions without duplicating models. The method derives tractable forward and backward transitions under Gaussian marginals, connects to Doob's h-transform and variational perspectives, and demonstrates strong performance on four high-resolution paired I2I datasets in both pixel and latent spaces, outperforming state-of-the-art unidirectional and bidirectional baselines. These results indicate substantial reductions in training cost and improved sample quality and diversity, with broad potential for extending bidirectional diffusion bridging to additional domains and multimodal tasks.

Abstract

Diffusion bridges have shown potential in paired image-to-image (I2I) translation tasks. However, existing methods are limited by their unidirectional nature, requiring separate models for forward and reverse translations. This not only doubles the computational cost but also restricts their practicality. In this work, we introduce the Bidirectional Diffusion Bridge Model (BDBM), a scalable approach that facilitates bidirectional translation between two coupled distributions using a single network. BDBM leverages the Chapman-Kolmogorov Equation for bridges, enabling it to model data distribution shifts across timesteps in both forward and backward directions by exploiting the interchangeability of the initial and target timesteps within this framework. Notably, when the marginal distribution given endpoints is Gaussian, BDBM's transition kernels in both directions possess analytical forms, allowing for efficient learning with a single network. We demonstrate the connection between BDBM and existing bridge methods, such as Doob's h-transform and variational approaches, and highlight its advantages. Extensive experiments on high-resolution I2I translation tasks demonstrate that BDBM not only enables bidirectional translation with minimal additional cost but also outperforms state-of-the-art bridge models. Our source code is available at [https://github.com/kvmduc/BDBM||https://github.com/kvmduc/BDBM].

Paper Structure

This paper contains 48 sections, 58 equations, 13 figures, 8 tables, 3 algorithms.

Figures (13)

  • Figure 1: An illustration of Bidirectional Diffusion Bridge Models (BDBM). Instead of learning two separate models $z_{\theta}\left(t,x_{t},x_{0}\right)$ and $z_{\phi}\left(s,x_{s},x_{T}\right)$ for the forward and backward transitions, we learn a single model $z_{\varphi}\left(t,x_{t},\left(1-m\right)*x_{0},m*x_{T}\right)$ with a binary mask $m$ that enables transition in both directions. Grey and white nodes denote initial and generated samples, respectively.
  • Figure 2: Images generated by BDBM and unidirectional baselines in the Edges$\rightarrow$Shoes, Edges$\rightarrow$Handbags, and Normal$\rightarrow$Outdoor translation tasks.
  • Figure 3: LPIPS curves of BDBM and unidirectional baselines on Edges$\rightarrow$Shoes and Edges$\rightarrow$Handbags.
  • Figure 4: Images generated by BDBM and bidirectional baselines on Edges$\leftrightarrow$Shoes and Edges$\leftrightarrow$Handbags. "Reference" column shows reference images of the two domains.
  • Figure 5: Samples generated by BDBM when translating from sketches to shoes using NFE=20 and NFE=200 for w.r.t. different values of $\eta$.
  • ...and 8 more figures