Diffusion-Based Image-to-Image Translation by Noise Correction via Prompt Interpolation

Junsung Lee; Minsoo Kang; Bohyung Han

Diffusion-Based Image-to-Image Translation by Noise Correction via Prompt Interpolation

Junsung Lee, Minsoo Kang, Bohyung Han

TL;DR

A simple but effective training-free approach tailored to diffusion-based image-to-image translation by introducing a noise correction term that achieves outstanding performance with low latency and consistently improves existing frameworks when combined with them.

Abstract

We propose a simple but effective training-free approach tailored to diffusion-based image-to-image translation. Our approach revises the original noise prediction network of a pretrained diffusion model by introducing a noise correction term. We formulate the noise correction term as the difference between two noise predictions; one is computed from the denoising network with a progressive interpolation of the source and target prompt embeddings, while the other is the noise prediction with the source prompt embedding. The final noise prediction network is given by a linear combination of the standard denoising term and the noise correction term, where the former is designed to reconstruct must-be-preserved regions while the latter aims to effectively edit regions of interest relevant to the target prompt. Our approach can be easily incorporated into existing image-to-image translation methods based on diffusion models. Extensive experiments verify that the proposed technique achieves outstanding performance with low latency and consistently improves existing frameworks when combined with them.

Diffusion-Based Image-to-Image Translation by Noise Correction via Prompt Interpolation

TL;DR

Abstract

Paper Structure (27 sections, 13 equations, 5 figures, 6 tables, 1 algorithm)

This paper contains 27 sections, 13 equations, 5 figures, 6 tables, 1 algorithm.

Introduction
Related Work
Text-to-Image Generation based on Diffusion Models
Text-Driven Image Editing based on Diffusion Models
Text-Driven Image-to-Image Translation
Inference of Latent Variables for Source Images
Reverse Process of Target Images
Our Approach
Overview
Noise Correction
Prompt Interpolation
Word replacement
Adding phrases
Integration into Existing Methods
Prompt-to-Prompt hertz2023prompt
...and 12 more sections

Figures (5)

Figure 1: Image-to-image translation results using the proposed method on data sampled from the LAION-5B dataset schuhmann2022laion. Our approach effectively preserves the structure and the background in source images while successfully editing the local region of interest.
Figure 2: Visualization of the progressively updated noise correction term $\Delta \epsilon_\theta(\mathbf{x}^{\text{tgt}}, t, \mathbf{y}_t)$ over time for each pair of source and target images.
Figure 3: Qualitative comparisons between PIC and state-of-the-art methods hertz2023prompttumanyan2023plugparmar2023zero on images from LAION-5B schuhmann2022laion using the pretrained Stable Diffusion rombach2022high. PIC generates target images with higher-fidelity than others in all tasks. Note that all algorithms fail to preserve pose and texture of the source image in the last task, but PIC still shows a favorable result.
Figure 4: Qualitative results of existing state-of-the-art methods and their combinations with PIC based on the pretrained Stable Diffusion rombach2022high: (top) Prompt-to-Prompt hertz2023prompt, (middle) Plug-and-Play tumanyan2023plug, and (bottom) Pix2Pix-Zero parmar2023zero. The examples are sampled from LAION-5B schuhmann2022laion.
Figure 5: Qualitative results of the proposed method by varying $\gamma$ on data sampled from the LAION-5B dataset schuhmann2022laion, relying on the pretrained Stable Diffusion rombach2022high.

Diffusion-Based Image-to-Image Translation by Noise Correction via Prompt Interpolation

TL;DR

Abstract

Diffusion-Based Image-to-Image Translation by Noise Correction via Prompt Interpolation

Authors

TL;DR

Abstract

Table of Contents

Figures (5)