A Diffusion Model Translator for Efficient Image-to-Image Translation
Mengfei Xia, Yu Zhou, Ran Yi, Yong-Jin Liu, Wenping Wang
TL;DR
This paper addresses the inefficiency of applying diffusion models to image-to-image translation by proposing a Diffusion Model Translator (DMT) that attaches a lightweight translator at a single preset diffusion timestep, avoiding injection at every denoising step. It provides a theoretical justification showing that transferring the distribution between source and target domains at an intermediate step is feasible, and it introduces a practical method to automatically select the translation timestep. The translator is trained via a variational lower bound, resulting in a Gaussian mapping with a tractable objective, and a reparameterization ties the translator to the shared forward diffusion of both domains. Empirically, DMT achieves competitive or superior image quality across stylization, colorization, segmentation-to-image, and sketch-to-image tasks while delivering substantial speedups (via early translation and DDIM acceleration), validating its practical impact for fast, high-quality conditional image synthesis.
Abstract
Applying diffusion models to image-to-image translation (I2I) has recently received increasing attention due to its practical applications. Previous attempts inject information from the source image into each denoising step for an iterative refinement, thus resulting in a time-consuming implementation. We propose an efficient method that equips a diffusion model with a lightweight translator, dubbed a Diffusion Model Translator (DMT), to accomplish I2I. Specifically, we first offer theoretical justification that in employing the pioneering DDPM work for the I2I task, it is both feasible and sufficient to transfer the distribution from one domain to another only at some intermediate step. We further observe that the translation performance highly depends on the chosen timestep for domain transfer, and therefore propose a practical strategy to automatically select an appropriate timestep for a given task. We evaluate our approach on a range of I2I applications, including image stylization, image colorization, segmentation to image, and sketch to image, to validate its efficacy and general utility. The comparisons show that our DMT surpasses existing methods in both quality and efficiency. Code will be made publicly available.
