Table of Contents
Fetching ...

SimInversion: A Simple Framework for Inversion-Based Text-to-Image Editing

Qi Qian, Haiyang Xu, Ming Yan, Juhua Hu

TL;DR

This work investigates the approximation error in DDIM inversion and proposes to disentangle the guidance scale for the source and target branches to reduce the error while keeping the original framework and can improve the performance of DDIM inversion dramatically without sacrificing efficiency.

Abstract

Diffusion models demonstrate impressive image generation performance with text guidance. Inspired by the learning process of diffusion, existing images can be edited according to text by DDIM inversion. However, the vanilla DDIM inversion is not optimized for classifier-free guidance and the accumulated error will result in the undesired performance. While many algorithms are developed to improve the framework of DDIM inversion for editing, in this work, we investigate the approximation error in DDIM inversion and propose to disentangle the guidance scale for the source and target branches to reduce the error while keeping the original framework. Moreover, a better guidance scale (i.e., 0.5) than default settings can be derived theoretically. Experiments on PIE-Bench show that our proposal can improve the performance of DDIM inversion dramatically without sacrificing efficiency.

SimInversion: A Simple Framework for Inversion-Based Text-to-Image Editing

TL;DR

This work investigates the approximation error in DDIM inversion and proposes to disentangle the guidance scale for the source and target branches to reduce the error while keeping the original framework and can improve the performance of DDIM inversion dramatically without sacrificing efficiency.

Abstract

Diffusion models demonstrate impressive image generation performance with text guidance. Inspired by the learning process of diffusion, existing images can be edited according to text by DDIM inversion. However, the vanilla DDIM inversion is not optimized for classifier-free guidance and the accumulated error will result in the undesired performance. While many algorithms are developed to improve the framework of DDIM inversion for editing, in this work, we investigate the approximation error in DDIM inversion and propose to disentangle the guidance scale for the source and target branches to reduce the error while keeping the original framework. Moreover, a better guidance scale (i.e., 0.5) than default settings can be derived theoretically. Experiments on PIE-Bench show that our proposal can improve the performance of DDIM inversion dramatically without sacrificing efficiency.
Paper Structure (25 sections, 6 theorems, 16 equations, 13 figures, 6 tables, 1 algorithm)

This paper contains 25 sections, 6 theorems, 16 equations, 13 figures, 6 tables, 1 algorithm.

Key Result

Proposition 1

Assuming that the gradient of $\epsilon$ on $z_{t-1}$ is bounded as $\|J_\epsilon(z_{t-1})\|_F\leq c$, we have

Figures (13)

  • Figure 1: Illustration of image editing by DDIM inversion and ours. $z_0$ denotes the source image. $z_0^s$ and $z_0^t$ are generated images from the source and target branches, respectively.
  • Figure 2: Illustration of image editing for random editing. The difference is highlighted by red bounding boxes.
  • Figure 3: Illustration of image editing for changing object.
  • Figure 4: Illustration of image editing for adding object.
  • Figure 5: Illustration of image editing for deleting object. The difference is highlighted by red bounding boxes.
  • ...and 8 more figures

Theorems & Definitions (12)

  • Proposition 1
  • Proposition 2
  • Proposition 3
  • Corollary 1
  • Proposition 4
  • proof
  • Corollary 2
  • proof
  • proof
  • proof
  • ...and 2 more