Follow-Your-Shape: Shape-Aware Image Editing via Trajectory-Guided Region Control
Zeqian Long, Mingzhe Zheng, Kunyu Feng, Xinhua Zhang, Hongyu Liu, Harry Yang, Linfeng Zhang, Qifeng Chen, Yue Ma
TL;DR
Follow-Your-Shape tackles the problem of large-scale, shape-aware image editing with a training-free, mask-free approach. It introduces a Trajectory Divergence Map (TDM) that quantifies token-wise velocity differences between inversion and editing trajectories, and employs a three-stage editing process with a scheduled KV injection and ControlNet conditioning to localize edits and preserve background. The key contributions are the TDM-based region localization, a staged editing strategy that stabilizes trajectories, and the ReShapeBench benchmark for rigorous shape-transformation evaluation; results on ReShapeBench show state-of-the-art background preservation and text alignment, with metrics such as PSNR $= 35.79$, LPIPS $= 8.23\times 10^{-3}$, CLIP-Sim $= 33.71$, and Aesthetic Score $= 6.57$. This framework enables robust, large-scale shape edits in a computationally efficient, training-free manner, offering practical impact for precise content modification in real-world applications, while also providing a standardized benchmark for future shape-aware editing research.
Abstract
While recent flow-based image editing models demonstrate general-purpose capabilities across diverse tasks, they often struggle to specialize in challenging scenarios -- particularly those involving large-scale shape transformations. When performing such structural edits, these methods either fail to achieve the intended shape change or inadvertently alter non-target regions, resulting in degraded background quality. We propose Follow-Your-Shape, a training-free and mask-free framework that supports precise and controllable editing of object shapes while strictly preserving non-target content. Motivated by the divergence between inversion and editing trajectories, we compute a Trajectory Divergence Map (TDM) by comparing token-wise velocity differences between the inversion and denoising paths. The TDM enables precise localization of editable regions and guides a Scheduled KV Injection mechanism that ensures stable and faithful editing. To facilitate a rigorous evaluation, we introduce ReShapeBench, a new benchmark comprising 120 new images and enriched prompt pairs specifically curated for shape-aware editing. Experiments demonstrate that our method achieves superior editability and visual fidelity, particularly in tasks requiring large-scale shape replacement.
