Table of Contents
Fetching ...

Seed-to-Seed: Image Translation in Diffusion Seed Space

Or Greenberg, Eran Kishon, Dani Lischinski

TL;DR

Seed-to-Seed Translation (StS) reframes unpaired Image-to-Image Translation as seed-space translation within a pretrained diffusion model, translating inverted seeds with a CycleGAN-based sts-GAN and generating target images via diffusion sampling from the translated seeds. A ControlNet guides sampling to preserve the source image's structure, enabling faithful transformations such as day-to-night and weather changes in automotive scenes, as well as age/gender edits. The approach demonstrates that inverted seeds encode meaningful semantic information and that seed-space manipulation can outperform GAN- and diffusion-based baselines in preserving structure while achieving target-domain appearance. This seed-space perspective offers a new axis for image editing with diffusion models and suggests broader applicability beyond automotive domains, with potential extensions in more flexible structure conditioning and improved inversion accuracy.

Abstract

We introduce Seed-to-Seed Translation (StS), a novel approach for Image-to-Image Translation using diffusion models (DMs), aimed at translations that require close adherence to the structure of the source image. In contrast to existing methods that modify images during the diffusion sampling process, we leverage the semantic information encoded within the space of inverted seeds of a pretrained DM, dubbed as the seed-space. We demonstrate that inverted seeds can be used for discriminative tasks, and can also be manipulated to achieve desired transformations in an unpaired image-to-image translation setting. Our method involves training an sts-GAN, an unpaired translation model between source and target seeds, based on CycleGAN. The final translated images are obtained by initiating the DM's sampling process from the translated seeds. A ControlNet is used to ensure the structural preservation of the input image. We demonstrate the effectiveness of our approach for the task of translating automotive scenes, showcasing superior performance compared to existing GAN-based and diffusion-based methods, as well as for several other unpaired image translation tasks. Our approach offers a fresh perspective on leveraging the semantic information encoded within the seed-space of pretrained DMs for effective image editing and manipulation.

Seed-to-Seed: Image Translation in Diffusion Seed Space

TL;DR

Seed-to-Seed Translation (StS) reframes unpaired Image-to-Image Translation as seed-space translation within a pretrained diffusion model, translating inverted seeds with a CycleGAN-based sts-GAN and generating target images via diffusion sampling from the translated seeds. A ControlNet guides sampling to preserve the source image's structure, enabling faithful transformations such as day-to-night and weather changes in automotive scenes, as well as age/gender edits. The approach demonstrates that inverted seeds encode meaningful semantic information and that seed-space manipulation can outperform GAN- and diffusion-based baselines in preserving structure while achieving target-domain appearance. This seed-space perspective offers a new axis for image editing with diffusion models and suggests broader applicability beyond automotive domains, with potential extensions in more flexible structure conditioning and improved inversion accuracy.

Abstract

We introduce Seed-to-Seed Translation (StS), a novel approach for Image-to-Image Translation using diffusion models (DMs), aimed at translations that require close adherence to the structure of the source image. In contrast to existing methods that modify images during the diffusion sampling process, we leverage the semantic information encoded within the space of inverted seeds of a pretrained DM, dubbed as the seed-space. We demonstrate that inverted seeds can be used for discriminative tasks, and can also be manipulated to achieve desired transformations in an unpaired image-to-image translation setting. Our method involves training an sts-GAN, an unpaired translation model between source and target seeds, based on CycleGAN. The final translated images are obtained by initiating the DM's sampling process from the translated seeds. A ControlNet is used to ensure the structural preservation of the input image. We demonstrate the effectiveness of our approach for the task of translating automotive scenes, showcasing superior performance compared to existing GAN-based and diffusion-based methods, as well as for several other unpaired image translation tasks. Our approach offers a fresh perspective on leveraging the semantic information encoded within the seed-space of pretrained DMs for effective image editing and manipulation.
Paper Structure (13 sections, 6 figures, 4 tables)

This paper contains 13 sections, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Seed-to-Seed Translation addresses the unpaired Image-to-Image Translation task by performing the translation in the seed-space of a pretrained diffusion model. The effectiveness of the resulting approach is demonstrated on a variety of image translation tasks.
  • Figure 2: StS Framework Overview. The source image $x^A_{0}$ is first inverted to a corresponding seed $z^A_{T}$. Then the initial seed is translated to a target domain referred seed $z^B_{T}$, which is finally sampled to yield the target domain output $x^B_{0}$.
  • Figure 3: Day-to-night translation with StS using different CFG-scales. While achieving a global night-time appearance, a low CFG-scale ($\omega = 1$) may result in lack of local domain-related semantic effects (middle). Using a higher CFG-scale ($\omega = 5$) introduces these important effects (right). The same prompt "A clear night" is used in both columns.
  • Figure 4: Qualitative comparison for Day-to-Night translation over the BDD100k dataset.
  • Figure 5: Additional examples for different domains over the BDD100k and DENSE datasets. In every pair of images, the left image is the source, while the right one is the translated version.
  • ...and 1 more figures