Table of Contents
Fetching ...

ERDDCI: Exact Reversible Diffusion via Dual-Chain Inversion for High-Quality Image Editing

Jimin Dai, Yingzhen Zhang, Shuo Chen, Jian Yang, Lei Luo

TL;DR

This work proposes a novel method, referred to as ERDDCI (Exact Reversible Diffusion via Dual-Chain Inversion), which uses the new Dual-Chain Inversion for joint inference to derive an exact reversible diffusion process and achieves high-quality image editing.

Abstract

Diffusion models (DMs) have been successfully applied to real image editing. These models typically invert images into latent noise vectors used to reconstruct the original images (known as inversion), and then edit them during the inference process. However, recent popular DMs often rely on the assumption of local linearization, where the noise injected during the inversion process is expected to approximate the noise removed during the inference process. While DM efficiently generates images under this assumption, it can also accumulate errors during the diffusion process due to the assumption, ultimately negatively impacting the quality of real image reconstruction and editing. To address this issue, we propose a novel method, referred to as ERDDCI (Exact Reversible Diffusion via Dual-Chain Inversion). ERDDCI uses the new Dual-Chain Inversion (DCI) for joint inference to derive an exact reversible diffusion process. By using DCI, our method effectively avoids the cumbersome optimization process in existing inversion approaches and achieves high-quality image editing. Additionally, to accommodate image operations under high guidance scales, we introduce a dynamic control strategy that enables more refined image reconstruction and editing. Our experiments demonstrate that ERDDCI significantly outperforms state-of-the-art methods in a 50-step diffusion process. It achieves rapid and precise image reconstruction with an SSIM of 0.999 and an LPIPS of 0.001, and also delivers competitive results in image editing.

ERDDCI: Exact Reversible Diffusion via Dual-Chain Inversion for High-Quality Image Editing

TL;DR

This work proposes a novel method, referred to as ERDDCI (Exact Reversible Diffusion via Dual-Chain Inversion), which uses the new Dual-Chain Inversion for joint inference to derive an exact reversible diffusion process and achieves high-quality image editing.

Abstract

Diffusion models (DMs) have been successfully applied to real image editing. These models typically invert images into latent noise vectors used to reconstruct the original images (known as inversion), and then edit them during the inference process. However, recent popular DMs often rely on the assumption of local linearization, where the noise injected during the inversion process is expected to approximate the noise removed during the inference process. While DM efficiently generates images under this assumption, it can also accumulate errors during the diffusion process due to the assumption, ultimately negatively impacting the quality of real image reconstruction and editing. To address this issue, we propose a novel method, referred to as ERDDCI (Exact Reversible Diffusion via Dual-Chain Inversion). ERDDCI uses the new Dual-Chain Inversion (DCI) for joint inference to derive an exact reversible diffusion process. By using DCI, our method effectively avoids the cumbersome optimization process in existing inversion approaches and achieves high-quality image editing. Additionally, to accommodate image operations under high guidance scales, we introduce a dynamic control strategy that enables more refined image reconstruction and editing. Our experiments demonstrate that ERDDCI significantly outperforms state-of-the-art methods in a 50-step diffusion process. It achieves rapid and precise image reconstruction with an SSIM of 0.999 and an LPIPS of 0.001, and also delivers competitive results in image editing.

Paper Structure

This paper contains 16 sections, 16 equations, 11 figures, 4 tables, 2 algorithms.

Figures (11)

  • Figure 1: Error Accumulation and Amplification in DDIM Inference. The DDIM adheres to a local linearization assumption. This assumption accumulates errors step by step during the inference process following image inversion (Part i), causing the semantic information of the reconstructed image to deviate from the original image. These errors are further amplified under high guidance scales (Part ii), affecting the fidelity of the image. Black arrows represent the inference or inversion process. Black curved arrows indicate that each step generates errors. Gray arrows demonstrate the gradual accumulation of errors. Solid arrows signify single-step processes, and dashed arrows denote multi-step processes.
  • Figure 2: Dual-chain inversion overview. DCI makes DMs exactly reversible between the auxiliary inversion chain and the inference chain. $\oplus$ represents the noise adding operation, which follows Eq.(\ref{['eq:8']}) in the original DDIM inversion chain, and follows Eq.(\ref{['eq:10']}) in the auxiliary inversion chain; $\ominus$ represents the denoising operation, following Eq.(\ref{['eq:11']}).
  • Figure 3: Predicted Noise on Different Inference Trajectories over Timesteps. The color of the trajectory gradually becomes lighter with the reverse timesteps ($T \longrightarrow 0$).
  • Figure 4: Reconstruction effects for various diffusion inversion methods. The AE image is decoded from Stable Diffusion without inversion and used as the ground truth for other methods' reconstruction.
  • Figure 5: Quantitative evaluation. Quantitative evaluation of image reconstruction results for ERDDCI with DCS and NTI. These two methods optimize the reconstruction under high guidance scales.
  • ...and 6 more figures