DragText: Rethinking Text Embedding in Point-based Image Editing
Gayoon Choi, Taejin Jeong, Sujung Hong, Seong Jae Hwang
TL;DR
DragText addresses the problem that static text embeddings hinder point-based diffusion editing by causing drag halting and semantic drift. It proposes a joint optimization framework that updates the text embedding in parallel with image dragging and includes a regularization term to preserve the original prompt, enabling seamless plug-and-play integration with diffusion-based drag methods. The approach yields consistent improvements in dragging accuracy and content preservation across methods, validated by qualitative results and metrics such as MD and the product LPIPS×MD, while also enabling controllable manipulation of both image and text embeddings. The work highlights the critical role of text image coupling in interactive editing and suggests broader implications for text-conditioned diffusion systems and prompt-aware editing pipelines.
Abstract
Point-based image editing enables accurate and flexible control through content dragging. However, the role of text embedding during the editing process has not been thoroughly investigated. A significant aspect that remains unexplored is the interaction between text and image embeddings. During the progressive editing in a diffusion model, the text embedding remains constant. As the image embedding increasingly diverges from its initial state, the discrepancy between the image and text embeddings presents a significant challenge. In this study, we found that the text prompt significantly influences the dragging process, particularly in maintaining content integrity and achieving the desired manipulation. Upon these insights, we propose DragText, which optimizes text embedding in conjunction with the dragging process to pair with the modified image embedding. Simultaneously, we regularize the text optimization process to preserve the integrity of the original text prompt. Our approach can be seamlessly integrated with existing diffusion-based drag methods, enhancing performance with only a few lines of code.
