Text-Driven Image Editing via Learnable Regions
Yuanze Lin, Yi-Wen Chen, Yi-Hsuan Tsai, Lu Jiang, Ming-Hsuan Yang
TL;DR
The paper tackles mask-free, text-driven local image editing by learning edit regions as bounding boxes guided by language prompts. It introduces a region generation network that selects bounding-box edits around anchor points derived from self-attention maps, and integrates with pre-trained editors such as MaskGIT and Stable Diffusion using a CLIP-based training objective with L = $\lambda_C L_{Clip} + \lambda_S L_{Str} + \lambda_D L_{Dir}$ and a ranking score S = $\alpha S_{t2i} + \beta S_{i2i}$ during inference. The approach achieves high fidelity edits that respect complex prompts, demonstrated through qualitative results and a user study where it outperformed several state-of-the-art baselines. By enabling mask-free, region-aware editing compatible with multiple editing models, the method offers a practical, scalable pathway for language-guided image manipulation in real-world applications. $L$ terms and $S$ terms are computed in CLIP space to align visual edits with textual descriptions and preserve source content where appropriate.
Abstract
Language has emerged as a natural interface for image editing. In this paper, we introduce a method for region-based image editing driven by textual prompts, without the need for user-provided masks or sketches. Specifically, our approach leverages an existing pre-trained text-to-image model and introduces a bounding box generator to identify the editing regions that are aligned with the textual prompts. We show that this simple approach enables flexible editing that is compatible with current image generation models, and is able to handle complex prompts featuring multiple objects, complex sentences, or lengthy paragraphs. We conduct an extensive user study to compare our method against state-of-the-art methods. The experiments demonstrate the competitive performance of our method in manipulating images with high fidelity and realism that correspond to the provided language descriptions. Our project webpage can be found at: https://yuanze-lin.me/LearnableRegions_page.
