AdaptiveDrag: Semantic-Driven Dragging on Diffusion-Based Image Editing

DuoSheng Chen; Binghui Chen; Yifeng Geng; Liefeng Bo

AdaptiveDrag: Semantic-Driven Dragging on Diffusion-Based Image Editing

DuoSheng Chen, Binghui Chen, Yifeng Geng, Liefeng Bo

TL;DR

AdaptiveDrag tackles the limitations of prior drag-based image editing methods by delivering a mask-free, semantics-aware editing framework. It combines an auto mask generation step (SAM-2 plus SLIC) with a semantic-driven latent optimization and a Correspondence Loss (CLoss) to stabilize diffusion sampling, enabling accurate dragging from handle points to target points across diverse domains. The approach demonstrates superior precision and feature preservation on tasks such as resizing, movement, and extension in scenes ranging from animals and faces to landscapes and clothing, with strong generalization to new domains. These contributions advance interactive diffusion-based editing by reducing user burden and aligning edits with meaningful semantic regions, offering practical gains for fine-grained image manipulation.

Abstract

Recently, several point-based image editing methods (e.g., DragDiffusion, FreeDrag, DragNoise) have emerged, yielding precise and high-quality results based on user instructions. However, these methods often make insufficient use of semantic information, leading to less desirable results. In this paper, we proposed a novel mask-free point-based image editing method, AdaptiveDrag, which provides a more flexible editing approach and generates images that better align with user intent. Specifically, we design an auto mask generation module using super-pixel division for user-friendliness. Next, we leverage a pre-trained diffusion model to optimize the latent, enabling the dragging of features from handle points to target points. To ensure a comprehensive connection between the input image and the drag process, we have developed a semantic-driven optimization. We design adaptive steps that are supervised by the positions of the points and the semantic regions derived from super-pixel segmentation. This refined optimization process also leads to more realistic and accurate drag results. Furthermore, to address the limitations in the generative consistency of the diffusion model, we introduce an innovative corresponding loss during the sampling process. Building on these effective designs, our method delivers superior generation results using only the single input image and the handle-target point pairs. Extensive experiments have been conducted and demonstrate that the proposed method outperforms others in handling various drag instructions (e.g., resize, movement, extension) across different domains (e.g., animals, human face, land space, clothing).

AdaptiveDrag: Semantic-Driven Dragging on Diffusion-Based Image Editing

TL;DR

Abstract

AdaptiveDrag: Semantic-Driven Dragging on Diffusion-Based Image Editing

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (20)