Table of Contents
Fetching ...

OT-ALD: Aligning Latent Distributions with Optimal Transport for Accelerated Image-to-Image Translation

Zhanpeng Wang, Shuting Cao, Yuhang Lu, Yuhan Li, Na Lei, Zhongxuan Luo

TL;DR

OT-ALD tackles unpaired image-to-image translation with diffusion models by addressing two key DDIB limitations: latent-distribution mismatch and translation inefficiency. It achieves this by computing an optimal-transport map $M_{ot,T}^{A\to B}$ to align the source-domain latent distribution $p_T^A$ with the target-domain latent distribution $p_T^B$ before the reverse diffusion in the target domain. Theoretical guarantees establish both sample- and distribution-level cycle-consistency and quantify the error introduced by residual mismatches, while experiments across four tasks and three datasets demonstrate improved sampling efficiency (≈20% faster) and lower FID (≈2.6 on average) compared with strong baselines. The approach retains the flexibility and cycle-consistency of DDIB and provides a practical speed-quality trade-off via diffusion termination time $T$ and noise scale $\eta$, making high-resolution I2I translation more efficient and reliable in real-world settings.

Abstract

The Dual Diffusion Implicit Bridge (DDIB) is an emerging image-to-image (I2I) translation method that preserves cycle consistency while achieving strong flexibility. It links two independently trained diffusion models (DMs) in the source and target domains by first adding noise to a source image to obtain a latent code, then denoising it in the target domain to generate the translated image. However, this method faces two key challenges: (1) low translation efficiency, and (2) translation trajectory deviations caused by mismatched latent distributions. To address these issues, we propose a novel I2I translation framework, OT-ALD, grounded in optimal transport (OT) theory, which retains the strengths of DDIB-based approach. Specifically, we compute an OT map from the latent distribution of the source domain to that of the target domain, and use the mapped distribution as the starting point for the reverse diffusion process in the target domain. Our error analysis confirms that OT-ALD eliminates latent distribution mismatches. Moreover, OT-ALD effectively balances faster image translation with improved image quality. Experiments on four translation tasks across three high-resolution datasets show that OT-ALD improves sampling efficiency by 20.29% and reduces the FID score by 2.6 on average compared to the top-performing baseline models.

OT-ALD: Aligning Latent Distributions with Optimal Transport for Accelerated Image-to-Image Translation

TL;DR

OT-ALD tackles unpaired image-to-image translation with diffusion models by addressing two key DDIB limitations: latent-distribution mismatch and translation inefficiency. It achieves this by computing an optimal-transport map to align the source-domain latent distribution with the target-domain latent distribution before the reverse diffusion in the target domain. Theoretical guarantees establish both sample- and distribution-level cycle-consistency and quantify the error introduced by residual mismatches, while experiments across four tasks and three datasets demonstrate improved sampling efficiency (≈20% faster) and lower FID (≈2.6 on average) compared with strong baselines. The approach retains the flexibility and cycle-consistency of DDIB and provides a practical speed-quality trade-off via diffusion termination time and noise scale , making high-resolution I2I translation more efficient and reliable in real-world settings.

Abstract

The Dual Diffusion Implicit Bridge (DDIB) is an emerging image-to-image (I2I) translation method that preserves cycle consistency while achieving strong flexibility. It links two independently trained diffusion models (DMs) in the source and target domains by first adding noise to a source image to obtain a latent code, then denoising it in the target domain to generate the translated image. However, this method faces two key challenges: (1) low translation efficiency, and (2) translation trajectory deviations caused by mismatched latent distributions. To address these issues, we propose a novel I2I translation framework, OT-ALD, grounded in optimal transport (OT) theory, which retains the strengths of DDIB-based approach. Specifically, we compute an OT map from the latent distribution of the source domain to that of the target domain, and use the mapped distribution as the starting point for the reverse diffusion process in the target domain. Our error analysis confirms that OT-ALD eliminates latent distribution mismatches. Moreover, OT-ALD effectively balances faster image translation with improved image quality. Experiments on four translation tasks across three high-resolution datasets show that OT-ALD improves sampling efficiency by 20.29% and reduces the FID score by 2.6 on average compared to the top-performing baseline models.

Paper Structure

This paper contains 27 sections, 4 theorems, 20 equations, 10 figures, 4 tables, 2 algorithms.

Key Result

Theorem 1

If $q_{T}^{B}=p_{T}^{A}$, we denote $q_{0}^{B}$ as the distribution generated by the DM$^{B}$, then $\mathcal{W}_{2}(p_{0}^{B},q_{0}^{B})$ can be estimated as follows where $\!I^{B}\left(T\right)=\exp\left(\int_{0}^{T}(L_{f}^{B}\left(t\right)+\frac{g^{B}\left(t\right)^{2}}{2}L_{\boldsymbol{S}_{\boldsymbol{\theta}}}^{B}\left(t\right))dt\right)\!$ and $\!\bar{I}^{B}\left(T\right)=\exp\left(\frac{1}

Figures (10)

  • Figure 1: (a) Top-row images are sources and bottom-row images are corresponding outputs generated by our model. (b) Contraction property of DMs. For initial distributions from different domains, DMs can exponentially shrink Wasserstein distance between their latent distributions through progressive noise injection. $T$ is the number of diffusion steps, and $q_{T}^{B}$ is initial distribution for the reverse process of $\mathrm{DM}^B$. In DDIB-based methods, $q_{T}^{B}=p_{T}^{A}$, while OT-ALD aligns latent distributions via OT map $M_{ot,T}^{A\to B}$ to ensure $q_{T}^{B}=M_{ot,T}^{A\to B}(p_{T}^{A})$. (c) Latent alignment impacts how closely the translated distribution approximates the ground truth (Theorem \ref{['theorem:Wasserstein_distance_upper_bound_DM']}). DDIB requires longer diffusion to compensate for misalignment, reducing efficiency. OT-ALD is less sensitive to $T$, but empirically, too few steps harm image quality/diversity, and too many accumulate errors.
  • Figure 2: The framework of OT-ALD. During training, two DMs are independently trained in domains $A$ and $B$, followed by computation of the OT map $M_{ot,T}^{A \to B}$ from $p_T^A$ to $p_T^B$. In translation, the source distribution $p_0^A$ is diffused to $p_T^A$, which is then mapped via $M_{ot,T}^{A \to B}$ to serve as the initial distribution for the reverse process in domain $B$, yielding the translated distribution $q_0^B$. In contrast, DDIB-based methods skip OT alignment and directly use $p_T^A$, leading to latent distribution mismatch. As shown in Theorem \ref{['theorem:Wasserstein_distance_upper_bound_DM']}, this mismatch introduces a theoretical gap that affects translation accuracy.
  • Figure 3: The smaller the adopted $\eta$ and $T$, the higher the fidelity of OT-ALD to the source image is maintained; Conversely , larger values of $\eta$ and $T$ indicate that OT-ALD can achieve greater diversity in I2I translation tasks.
  • Figure 4: The flexibility tests of OT-ALD on the trained DMs and OT map. Top-row images are sources; bottom-row images are the corresponding outputs generated by OT-ALD.
  • Figure 5: Comparison of cycle consistency between OT-ALD and DDIB-based methods.
  • ...and 5 more figures

Theorems & Definitions (8)

  • Theorem 1
  • Theorem 2
  • Theorem 3: Sample cycle consistency
  • Theorem 4: Distributional cycle consistency
  • proof
  • proof
  • proof : Proof
  • proof : Proof