RegionDrag: Fast Region-Based Image Editing with Diffusion Models

Jingyi Lu; Xinghui Li; Kai Han

RegionDrag: Fast Region-Based Image Editing with Diffusion Models

Jingyi Lu, Xinghui Li, Kai Han

TL;DR

RegionDrag introduces a region-based editing framework for diffusion models that addresses the ambiguities and slow inference of point-drag methods. By using handle-target region pairs and a Region-to-Point Mapping, the method performs latent copy-paste and mutual self-attention control in a single inversion-denoising cycle, achieving about 1.5 seconds per 512${\times}$512 edit. The authors extend existing datasets with region-based instructions and propose DragBench-SR and DragBench-DR benchmarks, showing faster and more faithful edits than baselines like DragDiffusion, SDE-Drag, and DiffEditor. Ablation studies highlight the benefits of region-based inputs and multi-step copy-paste for stability, with qualitative results illustrating reduced ambiguity and stronger region-level constraints. Overall, RegionDrag offers a practical, training-free approach for high-fidelity, region-level image editing with substantial speedups and improved fidelity.

Abstract

Point-drag-based image editing methods, like DragDiffusion, have attracted significant attention. However, point-drag-based approaches suffer from computational overhead and misinterpretation of user intentions due to the sparsity of point-based editing instructions. In this paper, we propose a region-based copy-and-paste dragging method, RegionDrag, to overcome these limitations. RegionDrag allows users to express their editing instructions in the form of handle and target regions, enabling more precise control and alleviating ambiguity. In addition, region-based operations complete editing in one iteration and are much faster than point-drag-based methods. We also incorporate the attention-swapping technique for enhanced stability during editing. To validate our approach, we extend existing point-drag-based datasets with region-based dragging instructions. Experimental results demonstrate that RegionDrag outperforms existing point-drag-based approaches in terms of speed, accuracy, and alignment with user intentions. Remarkably, RegionDrag completes the edit on an image with a resolution of 512x512 in less than 2 seconds, which is more than 100x faster than DragDiffusion, while achieving better performance. Project page: https://visual-ai.github.io/regiondrag.

RegionDrag: Fast Region-Based Image Editing with Diffusion Models

TL;DR

512 edit. The authors extend existing datasets with region-based instructions and propose DragBench-SR and DragBench-DR benchmarks, showing faster and more faithful edits than baselines like DragDiffusion, SDE-Drag, and DiffEditor. Ablation studies highlight the benefits of region-based inputs and multi-step copy-paste for stability, with qualitative results illustrating reduced ambiguity and stronger region-level constraints. Overall, RegionDrag offers a practical, training-free approach for high-fidelity, region-level image editing with substantial speedups and improved fidelity.

Abstract

Paper Structure (19 sections, 8 equations, 13 figures, 1 table, 2 algorithms)

This paper contains 19 sections, 8 equations, 13 figures, 1 table, 2 algorithms.

Introduction
Related Work
RegionDrag
Preliminary
From Point-Based to Region-Based Dragging
Editing Pipeline
Experimental Results
Datasets
Evaluation Metrics
Implementation Details
Baselines
Quantitative Evaluation
Qualitative Results
Ablation Study
Conclusion
...and 4 more sections

Figures (13)

Figure 1: Comparison of editing results and latency between point-drag-based methods and our region-drag-based method. Our gradient-free, region-based framework reduces editing time from approximately one minute to about 1.5 seconds for 512$\times$512 resolution images, while producing results that better align with users' intentions.
Figure 1: Comparisons of our method with baseline methods using MD($\times$100) and LPIPS($\times$100) metrics on DragBench-S(R) and DragBench-D(R) datasets. The time is measured in seconds and averaged across both datasets. The image size is 512$\times$512.
Figure 1: Improved editing quality with a higher percentage of transformed points.
Figure 2: Overall comparison of point-based editing and region-based editing, exemplified by manipulating a bird's beak. The region-based approach is shown to provide a more user-friendly and less ambiguous editing experience.
Figure 2: Results of applying different noise weight $\alpha$.
...and 8 more figures

RegionDrag: Fast Region-Based Image Editing with Diffusion Models

TL;DR

Abstract

RegionDrag: Fast Region-Based Image Editing with Diffusion Models

Authors

TL;DR

Abstract

Table of Contents

Figures (13)