RotationDrag: Point-based Image Editing with Rotated Diffusion Features
Minxing Luo, Wentao Cheng, Jian Yang
TL;DR
This paper tackles the challenge of point-based image editing with diffusion models under in-plane rotation. It introduces RotationDrag, which leverages feature maps from rotated inputs to stabilize handle-point tracking and motion supervision, addressing the observed instability of UNet features under rotation. A new RotateBench dataset is proposed to evaluate rotation-focused editing on real and generated images, complemented by a comprehensive user study showing RotationDrag's superior performance over DragDiffusion, FreeDrag, and SDE-Drag. The approach improves rotation-aware editing fidelity and precision, offering a practical enhancement for diffusion-based interactive image editing, albeit with slower runtime due to per-step inversions. The work paves the way for rotation-aware diffusion editing and provides benchmarks and guidance for future improvements in feature-space tracking under geometric transformations.
Abstract
A precise and user-friendly manipulation of image content while preserving image fidelity has always been crucial to the field of image editing. Thanks to the power of generative models, recent point-based image editing methods allow users to interactively change the image content with high generalizability by clicking several control points. But the above mentioned editing process is usually based on the assumption that features stay constant in the motion supervision step from initial to target points. In this work, we conduct a comprehensive investigation in the feature space of diffusion models, and find that features change acutely under in-plane rotation. Based on this, we propose a novel approach named RotationDrag, which significantly improves point-based image editing performance when users intend to in-plane rotate the image content. Our method tracks handle points more precisely by utilizing the feature map of the rotated images, thus ensuring precise optimization and high image fidelity. Furthermore, we build a in-plane rotation focused benchmark called RotateBench, the first benchmark to evaluate the performance of point-based image editing method under in-plane rotation scenario on both real images and generated images. A thorough user study demonstrates the superior capability in accomplishing in-plane rotation that users intend to achieve, comparing the DragDiffusion baseline and other existing diffusion-based methods. See the project page https://github.com/Tony-Lowe/RotationDrag for code and experiment results.
