Seed-to-Seed: Image Translation in Diffusion Seed Space
Or Greenberg, Eran Kishon, Dani Lischinski
TL;DR
Seed-to-Seed Translation (StS) reframes unpaired Image-to-Image Translation as seed-space translation within a pretrained diffusion model, translating inverted seeds with a CycleGAN-based sts-GAN and generating target images via diffusion sampling from the translated seeds. A ControlNet guides sampling to preserve the source image's structure, enabling faithful transformations such as day-to-night and weather changes in automotive scenes, as well as age/gender edits. The approach demonstrates that inverted seeds encode meaningful semantic information and that seed-space manipulation can outperform GAN- and diffusion-based baselines in preserving structure while achieving target-domain appearance. This seed-space perspective offers a new axis for image editing with diffusion models and suggests broader applicability beyond automotive domains, with potential extensions in more flexible structure conditioning and improved inversion accuracy.
Abstract
We introduce Seed-to-Seed Translation (StS), a novel approach for Image-to-Image Translation using diffusion models (DMs), aimed at translations that require close adherence to the structure of the source image. In contrast to existing methods that modify images during the diffusion sampling process, we leverage the semantic information encoded within the space of inverted seeds of a pretrained DM, dubbed as the seed-space. We demonstrate that inverted seeds can be used for discriminative tasks, and can also be manipulated to achieve desired transformations in an unpaired image-to-image translation setting. Our method involves training an sts-GAN, an unpaired translation model between source and target seeds, based on CycleGAN. The final translated images are obtained by initiating the DM's sampling process from the translated seeds. A ControlNet is used to ensure the structural preservation of the input image. We demonstrate the effectiveness of our approach for the task of translating automotive scenes, showcasing superior performance compared to existing GAN-based and diffusion-based methods, as well as for several other unpaired image translation tasks. Our approach offers a fresh perspective on leveraging the semantic information encoded within the seed-space of pretrained DMs for effective image editing and manipulation.
