OT-ALD: Aligning Latent Distributions with Optimal Transport for Accelerated Image-to-Image Translation
Zhanpeng Wang, Shuting Cao, Yuhang Lu, Yuhan Li, Na Lei, Zhongxuan Luo
TL;DR
OT-ALD tackles unpaired image-to-image translation with diffusion models by addressing two key DDIB limitations: latent-distribution mismatch and translation inefficiency. It achieves this by computing an optimal-transport map $M_{ot,T}^{A\to B}$ to align the source-domain latent distribution $p_T^A$ with the target-domain latent distribution $p_T^B$ before the reverse diffusion in the target domain. Theoretical guarantees establish both sample- and distribution-level cycle-consistency and quantify the error introduced by residual mismatches, while experiments across four tasks and three datasets demonstrate improved sampling efficiency (≈20% faster) and lower FID (≈2.6 on average) compared with strong baselines. The approach retains the flexibility and cycle-consistency of DDIB and provides a practical speed-quality trade-off via diffusion termination time $T$ and noise scale $\eta$, making high-resolution I2I translation more efficient and reliable in real-world settings.
Abstract
The Dual Diffusion Implicit Bridge (DDIB) is an emerging image-to-image (I2I) translation method that preserves cycle consistency while achieving strong flexibility. It links two independently trained diffusion models (DMs) in the source and target domains by first adding noise to a source image to obtain a latent code, then denoising it in the target domain to generate the translated image. However, this method faces two key challenges: (1) low translation efficiency, and (2) translation trajectory deviations caused by mismatched latent distributions. To address these issues, we propose a novel I2I translation framework, OT-ALD, grounded in optimal transport (OT) theory, which retains the strengths of DDIB-based approach. Specifically, we compute an OT map from the latent distribution of the source domain to that of the target domain, and use the mapped distribution as the starting point for the reverse diffusion process in the target domain. Our error analysis confirms that OT-ALD eliminates latent distribution mismatches. Moreover, OT-ALD effectively balances faster image translation with improved image quality. Experiments on four translation tasks across three high-resolution datasets show that OT-ALD improves sampling efficiency by 20.29% and reduces the FID score by 2.6 on average compared to the top-performing baseline models.
