Table of Contents
Fetching ...

DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image Editing

Chong Mou, Xintao Wang, Jiechong Song, Ying Shan, Jian Zhang

TL;DR

DiffEditor addresses the challenges of accuracy and flexibility in diffusion-based fine-grained image editing by introducing image prompts, region-aware stochastic differential equation sampling, regional gradient guidance, and a time-travel sampling strategy. The method integrates an image-prompt encoder with a pre-trained diffusion model, enabling accurate edits such as object moving, resizing, pasting, and appearance replacement without task-specific training. Through extensive experiments, it achieves state-of-the-art performance while reducing inference complexity relative to prior diffusion-based editors. The approach offers practical impact for precise, flexible editing in general images and cross-image scenarios, with code released for reproducibility.

Abstract

Large-scale Text-to-Image (T2I) diffusion models have revolutionized image generation over the last few years. Although owning diverse and high-quality generation capabilities, translating these abilities to fine-grained image editing remains challenging. In this paper, we propose DiffEditor to rectify two weaknesses in existing diffusion-based image editing: (1) in complex scenarios, editing results often lack editing accuracy and exhibit unexpected artifacts; (2) lack of flexibility to harmonize editing operations, e.g., imagine new content. In our solution, we introduce image prompts in fine-grained image editing, cooperating with the text prompt to better describe the editing content. To increase the flexibility while maintaining content consistency, we locally combine stochastic differential equation (SDE) into the ordinary differential equation (ODE) sampling. In addition, we incorporate regional score-based gradient guidance and a time travel strategy into the diffusion sampling, further improving the editing quality. Extensive experiments demonstrate that our method can efficiently achieve state-of-the-art performance on various fine-grained image editing tasks, including editing within a single image (e.g., object moving, resizing, and content dragging) and across images (e.g., appearance replacing and object pasting). Our source code is released at https://github.com/MC-E/DragonDiffusion.

DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image Editing

TL;DR

DiffEditor addresses the challenges of accuracy and flexibility in diffusion-based fine-grained image editing by introducing image prompts, region-aware stochastic differential equation sampling, regional gradient guidance, and a time-travel sampling strategy. The method integrates an image-prompt encoder with a pre-trained diffusion model, enabling accurate edits such as object moving, resizing, pasting, and appearance replacement without task-specific training. Through extensive experiments, it achieves state-of-the-art performance while reducing inference complexity relative to prior diffusion-based editors. The approach offers practical impact for precise, flexible editing in general images and cross-image scenarios, with code released for reproducibility.

Abstract

Large-scale Text-to-Image (T2I) diffusion models have revolutionized image generation over the last few years. Although owning diverse and high-quality generation capabilities, translating these abilities to fine-grained image editing remains challenging. In this paper, we propose DiffEditor to rectify two weaknesses in existing diffusion-based image editing: (1) in complex scenarios, editing results often lack editing accuracy and exhibit unexpected artifacts; (2) lack of flexibility to harmonize editing operations, e.g., imagine new content. In our solution, we introduce image prompts in fine-grained image editing, cooperating with the text prompt to better describe the editing content. To increase the flexibility while maintaining content consistency, we locally combine stochastic differential equation (SDE) into the ordinary differential equation (ODE) sampling. In addition, we incorporate regional score-based gradient guidance and a time travel strategy into the diffusion sampling, further improving the editing quality. Extensive experiments demonstrate that our method can efficiently achieve state-of-the-art performance on various fine-grained image editing tasks, including editing within a single image (e.g., object moving, resizing, and content dragging) and across images (e.g., appearance replacing and object pasting). Our source code is released at https://github.com/MC-E/DragonDiffusion.
Paper Structure (16 sections, 8 equations, 13 figures, 2 tables, 1 algorithm)

This paper contains 16 sections, 8 equations, 13 figures, 2 tables, 1 algorithm.

Figures (13)

  • Figure 1: We propose DiffEditor, which can perform various fine-grained image editing operations on general images. Given an image, users can select an object to move or resize, or they can select sevaral pixel points for more accurate content dragging. Moreover, users can also choose a reference image for cross-image editing, i.e., object pasting and appearance replacing.
  • Figure 2: Illustration of editing flexibility limitations in DragDiff dragdiff and DragonDiff dragondiffusion, as well as our improvement.
  • Figure 3: Overview of our proposed DiffEditor, which is composed of a trainable image prompt encoder and a diffusion sampling with editing guidance that does not require training.
  • Figure 4: Illustration of the design of our image prompt encoder.
  • Figure 5: The impact of different components on the editing flexibility of DragonDiff dragondiffusion.
  • ...and 8 more figures