Table of Contents
Fetching ...

Dual-Schedule Inversion: Training- and Tuning-Free Inversion for Real Image Editing

Jiancheng Huang, Yi Huang, Jianzhuang Liu, Donghao Zhou, Yifan Liu, Shifeng Chen

TL;DR

This work tackles real-image editing with diffusion models by addressing reconstruction failures of DDIM Inversion. It introduces Dual-Schedule Inversion, which uses a primary and an auxiliary time schedule for both inversion and sampling to guarantee reversibility without fine-tuning, and couples it with an editing task classifier to automatically select among editing strategies. The approach yields superior reconstruction fidelity and editing quality across multiple tasks, achieving performance close to an upper bound and enabling user-friendly, semantically faithful edits. By integrating with existing editing methods (e.g., P2P, MasaCtrl, SDEdit) and automating task selection, the method holds practical promise for robust real-image editing in real-world applications.

Abstract

Text-conditional image editing is a practical AIGC task that has recently emerged with great commercial and academic value. For real image editing, most diffusion model-based methods use DDIM Inversion as the first stage before editing. However, DDIM Inversion often results in reconstruction failure, leading to unsatisfactory performance for downstream editing. To address this problem, we first analyze why the reconstruction via DDIM Inversion fails. We then propose a new inversion and sampling method named Dual-Schedule Inversion. We also design a classifier to adaptively combine Dual-Schedule Inversion with different editing methods for user-friendly image editing. Our work can achieve superior reconstruction and editing performance with the following advantages: 1) It can reconstruct real images perfectly without fine-tuning, and its reversibility is guaranteed mathematically. 2) The edited object/scene conforms to the semantics of the text prompt. 3) The unedited parts of the object/scene retain the original identity.

Dual-Schedule Inversion: Training- and Tuning-Free Inversion for Real Image Editing

TL;DR

This work tackles real-image editing with diffusion models by addressing reconstruction failures of DDIM Inversion. It introduces Dual-Schedule Inversion, which uses a primary and an auxiliary time schedule for both inversion and sampling to guarantee reversibility without fine-tuning, and couples it with an editing task classifier to automatically select among editing strategies. The approach yields superior reconstruction fidelity and editing quality across multiple tasks, achieving performance close to an upper bound and enabling user-friendly, semantically faithful edits. By integrating with existing editing methods (e.g., P2P, MasaCtrl, SDEdit) and automating task selection, the method holds practical promise for robust real-image editing in real-world applications.

Abstract

Text-conditional image editing is a practical AIGC task that has recently emerged with great commercial and academic value. For real image editing, most diffusion model-based methods use DDIM Inversion as the first stage before editing. However, DDIM Inversion often results in reconstruction failure, leading to unsatisfactory performance for downstream editing. To address this problem, we first analyze why the reconstruction via DDIM Inversion fails. We then propose a new inversion and sampling method named Dual-Schedule Inversion. We also design a classifier to adaptively combine Dual-Schedule Inversion with different editing methods for user-friendly image editing. Our work can achieve superior reconstruction and editing performance with the following advantages: 1) It can reconstruct real images perfectly without fine-tuning, and its reversibility is guaranteed mathematically. 2) The edited object/scene conforms to the semantics of the text prompt. 3) The unedited parts of the object/scene retain the original identity.

Paper Structure

This paper contains 19 sections, 1 theorem, 22 equations, 19 figures, 8 tables.

Key Result

Proposition 1

Let $\bar{z}_1^{p}$ and $\bar{z}_{10}^{a}$ be obtained by the original forward process of DDIM. Then $\tilde{z}_{1}^{p}=\bar{z}_{1}^{p}$ where $\tilde{z}_{1}^{p}$ is obtained by the Dual-Schedule Inversion method presented in Section 4.3 of the main paper.

Figures (19)

  • Figure 1: Examples of reconstruction by different inversion methods with guidance scale $4$. While Null-Text Inversion requires fine-tuning, the other three methods do not. Dual-Schedule Inversion achieves excellent performance without fine-tuning.
  • Figure 2: DDIM Inversion is irreversible.
  • Figure 3: Pipeline of Dual-Schedule Inversion, which is divided into inversion and sampling stages. The inversion stage (a) is for getting $\bar{z}^p_{981}$ and $\bar{z}^a_{970}$. The sampling stage is used for reconstruction (b) or editing (c) depending on the target prompt. Both stages have two time schedules where two specific schedules $[1,21,41,...,981]$ and $[10,30,50,...,970]$ are used for example.
  • Figure 4: Comparison with SOTA editing methods on real images for Object Replacement. N-P Inversion denotes Negative-Prompt Inversion.
  • Figure 5: Comparison with SOTA editing methods on real images for Action Editing.
  • ...and 14 more figures

Theorems & Definitions (2)

  • Proposition 1
  • proof