One-Step Image Translation with Text-to-Image Models
Gaurav Parmar, Taesung Park, Srinivasa Narasimhan, Jun-Yan Zhu
TL;DR
This work presents CycleGAN-Turbo and pix2pix-Turbo, one-step image-to-image translation methods that adapt pre-trained diffusion backbones via adversarial learning to tasks with and without paired data. By direct conditioning input, end-to-end architecture, LoRA adapters, and skip connections, the approach preserves input structure while enabling fast inference and strong translation quality, often matching or surpassing GAN-based and diffusion-based baselines. Extensive experiments on day-night, weather, and sketch/edge-to-image tasks show substantial speed advantages and competitive results, with robust ablations confirming the importance of key design choices. The findings suggest that one-step diffusion models can serve as versatile backbones for a range of GAN objectives, enabling real-time, flexible image translation with relatively small fine-tuning footprint.
Abstract
In this work, we address two limitations of existing conditional diffusion models: their slow inference speed due to the iterative denoising process and their reliance on paired data for model fine-tuning. To tackle these issues, we introduce a general method for adapting a single-step diffusion model to new tasks and domains through adversarial learning objectives. Specifically, we consolidate various modules of the vanilla latent diffusion model into a single end-to-end generator network with small trainable weights, enhancing its ability to preserve the input image structure while reducing overfitting. We demonstrate that, for unpaired settings, our model CycleGAN-Turbo outperforms existing GAN-based and diffusion-based methods for various scene translation tasks, such as day-to-night conversion and adding/removing weather effects like fog, snow, and rain. We extend our method to paired settings, where our model pix2pix-Turbo is on par with recent works like Control-Net for Sketch2Photo and Edge2Image, but with a single-step inference. This work suggests that single-step diffusion models can serve as strong backbones for a range of GAN learning objectives. Our code and models are available at https://github.com/GaParmar/img2img-turbo.
