Table of Contents
Fetching ...

Diffusion-Based Image-to-Image Translation by Noise Correction via Prompt Interpolation

Junsung Lee, Minsoo Kang, Bohyung Han

TL;DR

A simple but effective training-free approach tailored to diffusion-based image-to-image translation by introducing a noise correction term that achieves outstanding performance with low latency and consistently improves existing frameworks when combined with them.

Abstract

We propose a simple but effective training-free approach tailored to diffusion-based image-to-image translation. Our approach revises the original noise prediction network of a pretrained diffusion model by introducing a noise correction term. We formulate the noise correction term as the difference between two noise predictions; one is computed from the denoising network with a progressive interpolation of the source and target prompt embeddings, while the other is the noise prediction with the source prompt embedding. The final noise prediction network is given by a linear combination of the standard denoising term and the noise correction term, where the former is designed to reconstruct must-be-preserved regions while the latter aims to effectively edit regions of interest relevant to the target prompt. Our approach can be easily incorporated into existing image-to-image translation methods based on diffusion models. Extensive experiments verify that the proposed technique achieves outstanding performance with low latency and consistently improves existing frameworks when combined with them.

Diffusion-Based Image-to-Image Translation by Noise Correction via Prompt Interpolation

TL;DR

A simple but effective training-free approach tailored to diffusion-based image-to-image translation by introducing a noise correction term that achieves outstanding performance with low latency and consistently improves existing frameworks when combined with them.

Abstract

We propose a simple but effective training-free approach tailored to diffusion-based image-to-image translation. Our approach revises the original noise prediction network of a pretrained diffusion model by introducing a noise correction term. We formulate the noise correction term as the difference between two noise predictions; one is computed from the denoising network with a progressive interpolation of the source and target prompt embeddings, while the other is the noise prediction with the source prompt embedding. The final noise prediction network is given by a linear combination of the standard denoising term and the noise correction term, where the former is designed to reconstruct must-be-preserved regions while the latter aims to effectively edit regions of interest relevant to the target prompt. Our approach can be easily incorporated into existing image-to-image translation methods based on diffusion models. Extensive experiments verify that the proposed technique achieves outstanding performance with low latency and consistently improves existing frameworks when combined with them.
Paper Structure (27 sections, 13 equations, 5 figures, 6 tables, 1 algorithm)

This paper contains 27 sections, 13 equations, 5 figures, 6 tables, 1 algorithm.

Figures (5)

  • Figure 1: Image-to-image translation results using the proposed method on data sampled from the LAION-5B dataset schuhmann2022laion. Our approach effectively preserves the structure and the background in source images while successfully editing the local region of interest.
  • Figure 2: Visualization of the progressively updated noise correction term $\Delta \epsilon_\theta(\mathbf{x}^{\text{tgt}}, t, \mathbf{y}_t)$ over time for each pair of source and target images.
  • Figure 3: Qualitative comparisons between PIC and state-of-the-art methods hertz2023prompttumanyan2023plugparmar2023zero on images from LAION-5B schuhmann2022laion using the pretrained Stable Diffusion rombach2022high. PIC generates target images with higher-fidelity than others in all tasks. Note that all algorithms fail to preserve pose and texture of the source image in the last task, but PIC still shows a favorable result.
  • Figure 4: Qualitative results of existing state-of-the-art methods and their combinations with PIC based on the pretrained Stable Diffusion rombach2022high: (top) Prompt-to-Prompt hertz2023prompt, (middle) Plug-and-Play tumanyan2023plug, and (bottom) Pix2Pix-Zero parmar2023zero. The examples are sampled from LAION-5B schuhmann2022laion.
  • Figure 5: Qualitative results of the proposed method by varying $\gamma$ on data sampled from the LAION-5B dataset schuhmann2022laion, relying on the pretrained Stable Diffusion rombach2022high.