StableDrag: Stable Dragging for Point-based Image Editing
Yutao Cui, Xiaotong Zhao, Guozhen Zhang, Shengming Cao, Kai Ma, Limin Wang
TL;DR
This work tackles unstable long-range, point-based image editing observed in DragGAN and DragDiffusion by introducing StableDrag, a framework with two key innovations: discriminative point tracking that learns a lightweight convolutional filter to reliably locate updated handle points, and a confidence-based latent enhancement strategy that ensures complete, high-quality motion supervision across all editing steps. Built atop both GAN (StableDrag-GAN) and diffusion (StableDrag-Diff) models, StableDrag demonstrates improved stability and precision on DragBench, outperforming prior methods in mean distance and image fidelity, especially for challenging or long-range manipulations. The combination of a fast, discriminative tracker and an adaptive supervision scheme enables more reliable, pixel-level edits with minimal runtime overhead, offering a generalizable approach for high-quality image editing across generative paradigms and practical release-ready implementations.
Abstract
Point-based image editing has attracted remarkable attention since the emergence of DragGAN. Recently, DragDiffusion further pushes forward the generative quality via adapting this dragging technique to diffusion models. Despite these great success, this dragging scheme exhibits two major drawbacks, namely inaccurate point tracking and incomplete motion supervision, which may result in unsatisfactory dragging outcomes. To tackle these issues, we build a stable and precise drag-based editing framework, coined as StableDrag, by designing a discirminative point tracking method and a confidence-based latent enhancement strategy for motion supervision. The former allows us to precisely locate the updated handle points, thereby boosting the stability of long-range manipulation, while the latter is responsible for guaranteeing the optimized latent as high-quality as possible across all the manipulation steps. Thanks to these unique designs, we instantiate two types of image editing models including StableDrag-GAN and StableDrag-Diff, which attains more stable dragging performance, through extensive qualitative experiments and quantitative assessment on DragBench.
