Table of Contents
Fetching ...

Schedule Your Edit: A Simple yet Effective Diffusion Noise Schedule for Image Editing

Haonan Lin, Mengmeng Wang, Jiahao Wang, Wenbin An, Yan Chen, Yong Liu, Feng Tian, Guang Dai, Jingdong Wang, Qianying Wang

TL;DR

The Logistic Schedule is introduced, a novel noise schedule designed to eliminate singularities, improve inversion stability, and provide a better noise space for image editing that reduces noise prediction errors, enabling more faithful editing that preserves the original content of the source image.

Abstract

Text-guided diffusion models have significantly advanced image editing, enabling high-quality and diverse modifications driven by text prompts. However, effective editing requires inverting the source image into a latent space, a process often hindered by prediction errors inherent in DDIM inversion. These errors accumulate during the diffusion process, resulting in inferior content preservation and edit fidelity, especially with conditional inputs. We address these challenges by investigating the primary contributors to error accumulation in DDIM inversion and identify the singularity problem in traditional noise schedules as a key issue. To resolve this, we introduce the Logistic Schedule, a novel noise schedule designed to eliminate singularities, improve inversion stability, and provide a better noise space for image editing. This schedule reduces noise prediction errors, enabling more faithful editing that preserves the original content of the source image. Our approach requires no additional retraining and is compatible with various existing editing methods. Experiments across eight editing tasks demonstrate the Logistic Schedule's superior performance in content preservation and edit fidelity compared to traditional noise schedules, highlighting its adaptability and effectiveness.

Schedule Your Edit: A Simple yet Effective Diffusion Noise Schedule for Image Editing

TL;DR

The Logistic Schedule is introduced, a novel noise schedule designed to eliminate singularities, improve inversion stability, and provide a better noise space for image editing that reduces noise prediction errors, enabling more faithful editing that preserves the original content of the source image.

Abstract

Text-guided diffusion models have significantly advanced image editing, enabling high-quality and diverse modifications driven by text prompts. However, effective editing requires inverting the source image into a latent space, a process often hindered by prediction errors inherent in DDIM inversion. These errors accumulate during the diffusion process, resulting in inferior content preservation and edit fidelity, especially with conditional inputs. We address these challenges by investigating the primary contributors to error accumulation in DDIM inversion and identify the singularity problem in traditional noise schedules as a key issue. To resolve this, we introduce the Logistic Schedule, a novel noise schedule designed to eliminate singularities, improve inversion stability, and provide a better noise space for image editing. This schedule reduces noise prediction errors, enabling more faithful editing that preserves the original content of the source image. Our approach requires no additional retraining and is compatible with various existing editing methods. Experiments across eight editing tasks demonstrate the Logistic Schedule's superior performance in content preservation and edit fidelity compared to traditional noise schedules, highlighting its adaptability and effectiveness.

Paper Structure

This paper contains 48 sections, 3 theorems, 71 equations, 20 figures, 10 tables.

Key Result

Proposition 3.1

During the inversion process, there exists a singularity at $t=0$ for both the scaled linear and cosine schedule (Fig. fig:singularity_derivatives right): This singularity significantly affects the starting point of the inversion process during image editing tasks. Properly modeling $\mathrm{d}\mathbf{x}_t/\mathrm{d}t$ ensures that the inversion closely aligns with the true continuous dynamics of

Figures (20)

  • Figure 1: Compared to linear noise schedule, Logistic Schedule ❶ demonstrates high fidelity in attributes content editing (a, b) with EF-DDPM huberman2023edit, ❷ preserves the high-level semantics of the source image while performing object translation (c) with pix2pix-zero parmar2023zero and style/scene transferring (d, e) with StyleDiffusion wang2023stylediffusion, and ❸ successfully conducts non-rigid alteration (f) via MasaCtrl cao2023masactrl. Text prompts corresponding to each input image are presented beneath each sample, with words introduced for image editing distinctly highlighted in red.
  • Figure 2: Illustration of the DDIM inversion in image editing and its challenges. Left: starting from the source image $\mathbf{x}_0$, the ideal latent $\mathbf{x}_t$ is approximated by the inverted latent $\mathbf{x}_t^*$ using DDIM inversion. The perturbed noisy latent $\mathbf{x}_T^*$ is then sampled in two branches—one for the source condition and one for the target condition—yielding the reconstructed and edited images respectively. Right: the numerical computations of $\mathrm{d}\mathbf{x}_t/\mathrm{d}t$ for scaled linear and cosine noise schedules, highlighting the singularity at $t=0$ that leads to potential inaccuracies in noise prediction during inversion.
  • Figure 3: Left: trends of $\sqrt{1-\alpha_t}$ (noise scales) for scaled linear, cosine, and logistic noise schedules. Right: $\mathrm{d}\mathbf{x}_t/\mathrm{d}t$ for the logistic schedule, highlighting its smooth transition, which prevents singularities and maintains the integrity of the initial latent vector $\mathbf{x}_0$.
  • Figure 4: Analysis of noise space for different schedules. Left: logSNR trends, where the logistic schedule maintains a more gradual decline. Right: inversion processes, with the logistic schedule preserving more details in the initial stage and minimizing low-frequency retention in the final stage.
  • Figure 5: Qualitative comparison of the Logistic Schedule with linear and cosine schedules across various image editing tasks. To preserve background content during ① attribute editing tasks (e.g., colors, and materials), we employ Edit Friendly DDPMhuberman2023edit; for tasks requiring background preservation such as ② object translation, we use Zero-shot Pix2Pixparmar2023zero; for tasks involving ③ scene or style transfer, while maintaining object semantics, we utilize StyleDiffusionwang2023stylediffusion; to validate spatial context preservation in ④ non-rigid editing tasks (e.g., motion, pose), we consider MasaCtrlcao2023masactrl.
  • ...and 15 more figures

Theorems & Definitions (8)

  • Proposition 3.1: Singularity in Inversion Process
  • Theorem A.1: DDIM ODEs
  • proof
  • Proposition B.1: Singularity in Inversion Process
  • proof
  • proof
  • proof
  • proof