Table of Contents
Fetching ...

The Blessing of Randomness: SDE Beats ODE in General Diffusion-based Image Editing

Shen Nie, Hanzhong Allan Guo, Cheng Lu, Yuhao Zhou, Chenyu Zheng, Chongxuan Li

TL;DR

The paper reframes diffusion-based image editing as a probabilistic process where editing defines a task-specific SDE/ODE. It proves that KL divergence between edited and data distributions contracts under SDE editing as time approaches zero, unlike ODE editing, and introduces SDE-Drag along with the DragBench benchmark. Through extensive inpainting, image-to-image translation, and dragging experiments, the authors demonstrate that SDE-based editing consistently outperforms ODE baselines and state-of-the-art dragging methods, with comparable time efficiency. The work provides a principled theoretical foundation for the superiority of diffusion randomness in editing and delivers practical, open-set capable editing tools. Overall, it broadens the scope and effectiveness of diffusion-based image editing across diverse tasks.

Abstract

We present a unified probabilistic formulation for diffusion-based image editing, where a latent variable is edited in a task-specific manner and generally deviates from the corresponding marginal distribution induced by the original stochastic or ordinary differential equation (SDE or ODE). Instead, it defines a corresponding SDE or ODE for editing. In the formulation, we prove that the Kullback-Leibler divergence between the marginal distributions of the two SDEs gradually decreases while that for the ODEs remains as the time approaches zero, which shows the promise of SDE in image editing. Inspired by it, we provide the SDE counterparts for widely used ODE baselines in various tasks including inpainting and image-to-image translation, where SDE shows a consistent and substantial improvement. Moreover, we propose SDE-Drag -- a simple yet effective method built upon the SDE formulation for point-based content dragging. We build a challenging benchmark (termed DragBench) with open-set natural, art, and AI-generated images for evaluation. A user study on DragBench indicates that SDE-Drag significantly outperforms our ODE baseline, existing diffusion-based methods, and the renowned DragGAN. Our results demonstrate the superiority and versatility of SDE in image editing and push the boundary of diffusion-based editing methods.

The Blessing of Randomness: SDE Beats ODE in General Diffusion-based Image Editing

TL;DR

The paper reframes diffusion-based image editing as a probabilistic process where editing defines a task-specific SDE/ODE. It proves that KL divergence between edited and data distributions contracts under SDE editing as time approaches zero, unlike ODE editing, and introduces SDE-Drag along with the DragBench benchmark. Through extensive inpainting, image-to-image translation, and dragging experiments, the authors demonstrate that SDE-based editing consistently outperforms ODE baselines and state-of-the-art dragging methods, with comparable time efficiency. The work provides a principled theoretical foundation for the superiority of diffusion randomness in editing and delivers practical, open-set capable editing tools. Overall, it broadens the scope and effectiveness of diffusion-based image editing across diverse tasks.

Abstract

We present a unified probabilistic formulation for diffusion-based image editing, where a latent variable is edited in a task-specific manner and generally deviates from the corresponding marginal distribution induced by the original stochastic or ordinary differential equation (SDE or ODE). Instead, it defines a corresponding SDE or ODE for editing. In the formulation, we prove that the Kullback-Leibler divergence between the marginal distributions of the two SDEs gradually decreases while that for the ODEs remains as the time approaches zero, which shows the promise of SDE in image editing. Inspired by it, we provide the SDE counterparts for widely used ODE baselines in various tasks including inpainting and image-to-image translation, where SDE shows a consistent and substantial improvement. Moreover, we propose SDE-Drag -- a simple yet effective method built upon the SDE formulation for point-based content dragging. We build a challenging benchmark (termed DragBench) with open-set natural, art, and AI-generated images for evaluation. A user study on DragBench indicates that SDE-Drag significantly outperforms our ODE baseline, existing diffusion-based methods, and the renowned DragGAN. Our results demonstrate the superiority and versatility of SDE in image editing and push the boundary of diffusion-based editing methods.
Paper Structure (49 sections, 8 theorems, 33 equations, 21 figures, 8 tables, 5 algorithms)

This paper contains 49 sections, 8 theorems, 33 equations, 21 figures, 8 tables, 5 algorithms.

Key Result

Theorem 3.1

Let $\tilde{p}_{t}$ and $p_{t}$ be the marginal distributions of two SDEs (see Eq. (equ:reverse_sde)) at time $t$. For any $0 \le s < t \le T$, if $\tilde{p}_{t} \neq p_{t}$, then under some mild regularity conditions listed in DBLP:conf/icml/LuZB0LZ22, it holds that where $D_{\mathrm{KL}}(\cdot \Vert \cdot)$ denote the KL divergence and $D_{\mathrm{Fisher}}(\cdot \Vert \cdot)$ denote the Fisher

Figures (21)

  • Figure 1: Overview of the paper. (a) Technical contributions. (b) Visualization of SDE-Drag.
  • Figure 2: Results in I2I (DiffEdit). We consider $t_0 \in \{0.3, 0.4, 0.5, 0.6, 0.7, 0.8\}$. With the same value of $t_0$ (linked by dashed lines), DiffEdit-SDE outperforms DiffEdit-ODE under all metrics.
  • Figure 3: Results in dragging. (a-c) present the preference rates (with $95 \%$ confidence intervals) of SDE-Drag over ODE-Drag, DragDiffusion, and DragGAN. SDE-Drag significantly outperforms all competitors. The blank box in (c) denotes the ratio of the open-domain images in DragBench that DragGAN cannot edit. (d) shows that the average time cost per image is comparable for all methods.
  • Figure 4: Qualitative results in image reconstruction. Except for (d) which uses double precision, all other reconstruction experiments utilize single precision. Due to numerical instability in CFG, Cycle-SDE struggles to reconstruct the original image while using double precision ensures numerical stability.
  • Figure 5: 1D toy experiment. (a) Both ODE and SDE samplers match the data if $\tilde{p}_{T}({\bm{x}}_T) = \mathcal{N}(0, 1)$. (b-c) ODE fails to recover the data distribution while SDE succeeds though the prior distribution mismatch with $\tilde{p}_{T}({\bm{x}}_T) \neq \mathcal{N}(0, 1)$.
  • ...and 16 more figures

Theorems & Definitions (13)

  • Theorem 3.1: Contraction of SDEs, see Appendix \ref{['app:proof_sampler']}
  • Theorem 3.2: Invariance of ODEs, see Appendix \ref{['app:proof_sampler']}
  • Theorem 3.3: Contraction of Cycle-SDEs, see Appendix \ref{['app:proof_cycle']}
  • Theorem D.1: Contraction of SDEs
  • proof
  • Theorem D.2: Invariance of ODEs
  • proof
  • Proposition D.1
  • proof
  • Lemma D.1
  • ...and 3 more