RegionDrag: Fast Region-Based Image Editing with Diffusion Models
Jingyi Lu, Xinghui Li, Kai Han
TL;DR
RegionDrag introduces a region-based editing framework for diffusion models that addresses the ambiguities and slow inference of point-drag methods. By using handle-target region pairs and a Region-to-Point Mapping, the method performs latent copy-paste and mutual self-attention control in a single inversion-denoising cycle, achieving about 1.5 seconds per 512${\times}$512 edit. The authors extend existing datasets with region-based instructions and propose DragBench-SR and DragBench-DR benchmarks, showing faster and more faithful edits than baselines like DragDiffusion, SDE-Drag, and DiffEditor. Ablation studies highlight the benefits of region-based inputs and multi-step copy-paste for stability, with qualitative results illustrating reduced ambiguity and stronger region-level constraints. Overall, RegionDrag offers a practical, training-free approach for high-fidelity, region-level image editing with substantial speedups and improved fidelity.
Abstract
Point-drag-based image editing methods, like DragDiffusion, have attracted significant attention. However, point-drag-based approaches suffer from computational overhead and misinterpretation of user intentions due to the sparsity of point-based editing instructions. In this paper, we propose a region-based copy-and-paste dragging method, RegionDrag, to overcome these limitations. RegionDrag allows users to express their editing instructions in the form of handle and target regions, enabling more precise control and alleviating ambiguity. In addition, region-based operations complete editing in one iteration and are much faster than point-drag-based methods. We also incorporate the attention-swapping technique for enhanced stability during editing. To validate our approach, we extend existing point-drag-based datasets with region-based dragging instructions. Experimental results demonstrate that RegionDrag outperforms existing point-drag-based approaches in terms of speed, accuracy, and alignment with user intentions. Remarkably, RegionDrag completes the edit on an image with a resolution of 512x512 in less than 2 seconds, which is more than 100x faster than DragDiffusion, while achieving better performance. Project page: https://visual-ai.github.io/regiondrag.
