Table of Contents
Fetching ...

FastDrag: Manipulate Anything in One Step

Xuanjia Zhao, Jian Guan, Congyi Fan, Dongli Xu, Youtian Lin, Haiwei Pan, Pengming Feng

TL;DR

This paper introduces a novel one-step drag-based image editing method, i.e., FastDrag, to accelerate the editing process and achieves one-step latent semantic optimization and hence significantly promotes editing speeds.

Abstract

Drag-based image editing using generative models provides precise control over image contents, enabling users to manipulate anything in an image with a few clicks. However, prevailing methods typically adopt $n$-step iterations for latent semantic optimization to achieve drag-based image editing, which is time-consuming and limits practical applications. In this paper, we introduce a novel one-step drag-based image editing method, i.e., FastDrag, to accelerate the editing process. Central to our approach is a latent warpage function (LWF), which simulates the behavior of a stretched material to adjust the location of individual pixels within the latent space. This innovation achieves one-step latent semantic optimization and hence significantly promotes editing speeds. Meanwhile, null regions emerging after applying LWF are addressed by our proposed bilateral nearest neighbor interpolation (BNNI) strategy. This strategy interpolates these regions using similar features from neighboring areas, thus enhancing semantic integrity. Additionally, a consistency-preserving strategy is introduced to maintain the consistency between the edited and original images by adopting semantic information from the original image, saved as key and value pairs in self-attention module during diffusion inversion, to guide the diffusion sampling. Our FastDrag is validated on the DragBench dataset, demonstrating substantial improvements in processing time over existing methods, while achieving enhanced editing performance. Project page: https://fastdrag-site.github.io/ .

FastDrag: Manipulate Anything in One Step

TL;DR

This paper introduces a novel one-step drag-based image editing method, i.e., FastDrag, to accelerate the editing process and achieves one-step latent semantic optimization and hence significantly promotes editing speeds.

Abstract

Drag-based image editing using generative models provides precise control over image contents, enabling users to manipulate anything in an image with a few clicks. However, prevailing methods typically adopt -step iterations for latent semantic optimization to achieve drag-based image editing, which is time-consuming and limits practical applications. In this paper, we introduce a novel one-step drag-based image editing method, i.e., FastDrag, to accelerate the editing process. Central to our approach is a latent warpage function (LWF), which simulates the behavior of a stretched material to adjust the location of individual pixels within the latent space. This innovation achieves one-step latent semantic optimization and hence significantly promotes editing speeds. Meanwhile, null regions emerging after applying LWF are addressed by our proposed bilateral nearest neighbor interpolation (BNNI) strategy. This strategy interpolates these regions using similar features from neighboring areas, thus enhancing semantic integrity. Additionally, a consistency-preserving strategy is introduced to maintain the consistency between the edited and original images by adopting semantic information from the original image, saved as key and value pairs in self-attention module during diffusion inversion, to guide the diffusion sampling. Our FastDrag is validated on the DragBench dataset, demonstrating substantial improvements in processing time over existing methods, while achieving enhanced editing performance. Project page: https://fastdrag-site.github.io/ .
Paper Structure (24 sections, 13 equations, 17 figures)

This paper contains 24 sections, 13 equations, 17 figures.

Figures (17)

  • Figure 1: (a) Existing methods usually require multiple iterations to transform an image from its original semantic to desired semantic; (b) Our method utilizes latent warpage function (LWF) to calculate the warpage vectors (i.e., $\boldsymbol{v}_j$) to move each individual pixel on feature map and achieve semantic optimization in one step.
  • Figure 2: Overall framework of FastDrag with four phases: diffusion inversion, diffusion sampling, one-step warpage optimization and BNNI. Diffusion inversion yields a noisy latent $\boldsymbol{z}_t$ and diffusion sampling reconstructs the image from the optimized noisy latent $\boldsymbol{z}'_t$. One-step warpage optimization is used for noisy latent optimization, where LWF is proposed to generate warpage vectors to adjust the location of individual pixels on the noisy latent with a simple latent relocation operation. BNNI is used to enhance the semantic integrity of noisy latent. A consistency-preserving strategy is introduced to maintain the consistency between original image and edited image.
  • Figure 3: Geometric representation of $\boldsymbol{v}_j^{i*}$. Circle $O$ is the circumscribed circle of the circumscribed rectangle enclosing the mask's shape. $p_j$ is the feature point requiring relocation, and $p_j^{i*}$ is its new position following the drag instruction $\boldsymbol{d}_{i}$
  • Figure 4: Illustration of bilateral nearest neighbor interpolation.
  • Figure 5: Illustration of consistency-preserving strategy.
  • ...and 12 more figures