Table of Contents
Fetching ...

FreeDrag: Feature Dragging for Reliable Point-based Image Editing

Pengyang Ling, Lin Chen, Pan Zhang, Huaian Chen, Yi Jin, Jinjin Zheng

TL;DR

FreeDrag tackles miss tracking and ambiguous tracking in point-based image editing by replacing exact point tracking with feature dragging guided by adaptive template features. It introduces a two-component mechanism: adaptive template updates $T_i^{k+1} = \lambda_i^k \cdot F_r(h_i^k) + (1 - \lambda_i^k) \cdot T_i^k$ and a line-search with backtracking that constrains movements along the line from the original handle to the target and optimizes over a controlled distance to minimize $\big| \big\| F_r(q_i) - T_i^{k+1} \big\|_1 - l \big|$. The approach is evaluated on StyleGAN2 and diffusion-based editors with the FreeDragBench dataset (2251 instructions) and CCSD as a symmetry-dragging metric, showing improved editing accuracy and speed over DragGAN and DragDiffusion. This work advances practical, robust, and efficient point-based editing, with strong implications for real-world content manipulation and benchmarking through the FreeDragBench suite.

Abstract

To serve the intricate and varied demands of image editing, precise and flexible manipulation in image content is indispensable. Recently, Drag-based editing methods have gained impressive performance. However, these methods predominantly center on point dragging, resulting in two noteworthy drawbacks, namely "miss tracking", where difficulties arise in accurately tracking the predetermined handle points, and "ambiguous tracking", where tracked points are potentially positioned in wrong regions that closely resemble the handle points. To address the above issues, we propose FreeDrag, a feature dragging methodology designed to free the burden on point tracking. The FreeDrag incorporates two key designs, i.e., template feature via adaptive updating and line search with backtracking, the former improves the stability against drastic content change by elaborately controls feature updating scale after each dragging, while the latter alleviates the misguidance from similar points by actively restricting the search area in a line. These two technologies together contribute to a more stable semantic dragging with higher efficiency. Comprehensive experimental results substantiate that our approach significantly outperforms pre-existing methodologies, offering reliable point-based editing even in various complex scenarios.

FreeDrag: Feature Dragging for Reliable Point-based Image Editing

TL;DR

FreeDrag tackles miss tracking and ambiguous tracking in point-based image editing by replacing exact point tracking with feature dragging guided by adaptive template features. It introduces a two-component mechanism: adaptive template updates and a line-search with backtracking that constrains movements along the line from the original handle to the target and optimizes over a controlled distance to minimize . The approach is evaluated on StyleGAN2 and diffusion-based editors with the FreeDragBench dataset (2251 instructions) and CCSD as a symmetry-dragging metric, showing improved editing accuracy and speed over DragGAN and DragDiffusion. This work advances practical, robust, and efficient point-based editing, with strong implications for real-world content manipulation and benchmarking through the FreeDragBench suite.

Abstract

To serve the intricate and varied demands of image editing, precise and flexible manipulation in image content is indispensable. Recently, Drag-based editing methods have gained impressive performance. However, these methods predominantly center on point dragging, resulting in two noteworthy drawbacks, namely "miss tracking", where difficulties arise in accurately tracking the predetermined handle points, and "ambiguous tracking", where tracked points are potentially positioned in wrong regions that closely resemble the handle points. To address the above issues, we propose FreeDrag, a feature dragging methodology designed to free the burden on point tracking. The FreeDrag incorporates two key designs, i.e., template feature via adaptive updating and line search with backtracking, the former improves the stability against drastic content change by elaborately controls feature updating scale after each dragging, while the latter alleviates the misguidance from similar points by actively restricting the search area in a line. These two technologies together contribute to a more stable semantic dragging with higher efficiency. Comprehensive experimental results substantiate that our approach significantly outperforms pre-existing methodologies, offering reliable point-based editing even in various complex scenarios.
Paper Structure (19 sections, 10 equations, 16 figures, 4 tables)

This paper contains 19 sections, 10 equations, 16 figures, 4 tables.

Figures (16)

  • Figure 1: The comparison between the feature-centric FreeDrag and point-based DragGAN pan2023drag and DragDiffusiondragdiffusion. Given an image input, users can assign handle points (red points) and target points (blue points) to force the semantic positions of the handle points to reach corresponding target points, and optional mask can also be provided by users to assign editing region.
  • Figure 2: Miss tracking of DragGAN pan2023drag due to the drastic change in layout (first and second rows) and the disappearance of handle points (third and last rows).
  • Figure 3: Ambiguous tracking in DragGAN pan2023drag due to the existence of similar structures.
  • Figure 4: Concept illustration of point dragging pipeline. $p_i^{k}$ denotes the tracked position of $i$-th handle point in $k$-th motion ($p_i^{0}=p_i$), and $t_i$ indicates the corresponding $i$-th target point.
  • Figure 5: Illustration of proposed feature dragging pipeline. $h_i^{k}$ denotes the searched point in $k$-th drag, which lies in the line formed by $p_i^{0}$ and $t_i$, and $T_i^{k}$ denotes the corresponding template feature. (a) Concept of feature dragging. (b) The coupling movement under multiple points dragging. (c) The visualization of Eq. \ref{['eq.whole_localization']}.
  • ...and 11 more figures