Table of Contents
Fetching ...

RotationDrag: Point-based Image Editing with Rotated Diffusion Features

Minxing Luo, Wentao Cheng, Jian Yang

TL;DR

This paper tackles the challenge of point-based image editing with diffusion models under in-plane rotation. It introduces RotationDrag, which leverages feature maps from rotated inputs to stabilize handle-point tracking and motion supervision, addressing the observed instability of UNet features under rotation. A new RotateBench dataset is proposed to evaluate rotation-focused editing on real and generated images, complemented by a comprehensive user study showing RotationDrag's superior performance over DragDiffusion, FreeDrag, and SDE-Drag. The approach improves rotation-aware editing fidelity and precision, offering a practical enhancement for diffusion-based interactive image editing, albeit with slower runtime due to per-step inversions. The work paves the way for rotation-aware diffusion editing and provides benchmarks and guidance for future improvements in feature-space tracking under geometric transformations.

Abstract

A precise and user-friendly manipulation of image content while preserving image fidelity has always been crucial to the field of image editing. Thanks to the power of generative models, recent point-based image editing methods allow users to interactively change the image content with high generalizability by clicking several control points. But the above mentioned editing process is usually based on the assumption that features stay constant in the motion supervision step from initial to target points. In this work, we conduct a comprehensive investigation in the feature space of diffusion models, and find that features change acutely under in-plane rotation. Based on this, we propose a novel approach named RotationDrag, which significantly improves point-based image editing performance when users intend to in-plane rotate the image content. Our method tracks handle points more precisely by utilizing the feature map of the rotated images, thus ensuring precise optimization and high image fidelity. Furthermore, we build a in-plane rotation focused benchmark called RotateBench, the first benchmark to evaluate the performance of point-based image editing method under in-plane rotation scenario on both real images and generated images. A thorough user study demonstrates the superior capability in accomplishing in-plane rotation that users intend to achieve, comparing the DragDiffusion baseline and other existing diffusion-based methods. See the project page https://github.com/Tony-Lowe/RotationDrag for code and experiment results.

RotationDrag: Point-based Image Editing with Rotated Diffusion Features

TL;DR

This paper tackles the challenge of point-based image editing with diffusion models under in-plane rotation. It introduces RotationDrag, which leverages feature maps from rotated inputs to stabilize handle-point tracking and motion supervision, addressing the observed instability of UNet features under rotation. A new RotateBench dataset is proposed to evaluate rotation-focused editing on real and generated images, complemented by a comprehensive user study showing RotationDrag's superior performance over DragDiffusion, FreeDrag, and SDE-Drag. The approach improves rotation-aware editing fidelity and precision, offering a practical enhancement for diffusion-based interactive image editing, albeit with slower runtime due to per-step inversions. The work paves the way for rotation-aware diffusion editing and provides benchmarks and guidance for future improvements in feature-space tracking under geometric transformations.

Abstract

A precise and user-friendly manipulation of image content while preserving image fidelity has always been crucial to the field of image editing. Thanks to the power of generative models, recent point-based image editing methods allow users to interactively change the image content with high generalizability by clicking several control points. But the above mentioned editing process is usually based on the assumption that features stay constant in the motion supervision step from initial to target points. In this work, we conduct a comprehensive investigation in the feature space of diffusion models, and find that features change acutely under in-plane rotation. Based on this, we propose a novel approach named RotationDrag, which significantly improves point-based image editing performance when users intend to in-plane rotate the image content. Our method tracks handle points more precisely by utilizing the feature map of the rotated images, thus ensuring precise optimization and high image fidelity. Furthermore, we build a in-plane rotation focused benchmark called RotateBench, the first benchmark to evaluate the performance of point-based image editing method under in-plane rotation scenario on both real images and generated images. A thorough user study demonstrates the superior capability in accomplishing in-plane rotation that users intend to achieve, comparing the DragDiffusion baseline and other existing diffusion-based methods. See the project page https://github.com/Tony-Lowe/RotationDrag for code and experiment results.
Paper Structure (16 sections, 4 equations, 4 figures, 1 table)

This paper contains 16 sections, 4 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: Rotation Drag significantly improves the point-based editing performance under rotation scenario. Given an input image, user provide pairs of handle points(red), target points(blue) and mask determining the editing region.
  • Figure 2: Overview of RotationDrag. Given an input image, we first obtain the latent code of the input image through DDIMsong2020denoising Inversion. Then we optimize the latent code step-by-step. During optimization, latent code of rotated image is used for UNet feature extraction, ensuring a more reliable point tracking. When optimization is finished, the latent code will go through DDIM Denoiser to restore the edited image.
  • Figure 3: Visual comparison between DragDiffusion, FreeDrag(our diffusion implementation version), SDE-Drag and RotationDrag. The left column displays the input images, while columns in the right displays editing results of DragDiffusion, Diffuison version FreeDrag of our implementation, FreeDrag, SDE-Drag and RotationDrag respectively.
  • Figure 4: Results of user study. RotationDrag outperforms all competitors by a large margin.