LOCATEdit: Graph Laplacian Optimized Cross Attention for Localized Text-Guided Image Editing

Achint Soni; Meet Soni; Sirisha Rambhatla

LOCATEdit: Graph Laplacian Optimized Cross Attention for Localized Text-Guided Image Editing

Achint Soni, Meet Soni, Sirisha Rambhatla

TL;DR

LOCATEdit addresses imprecise cross-attention edits in diffusion-based image editing by introducing CASA graph-based refinement and a graph Laplacian regularizer to enforce spatial coherence. It constructs CASA graphs from cross- and self-attention and performs a closed-form graph Laplacian optimization to produce a refined, localized editing mask. The method combines selective embedding interpolation via an IP-Adapter and a dual-branch, training-free editing framework to keep edits confined to target regions while preserving background structure, with a provable solution $\mathbf{m}^* = (\mathbf{\Lambda} + \lambda \mathbf{L})^{-1} \mathbf{\Lambda} \mathbf{m}_0$. Experiments on PIE-Bench show state-of-the-art performance in structure fidelity and CLIP-based alignment, enabling reliable localized text-guided edits in diverse scenes.

Abstract

Text-guided image editing aims to modify specific regions of an image according to natural language instructions while maintaining the general structure and the background fidelity. Existing methods utilize masks derived from cross-attention maps generated from diffusion models to identify the target regions for modification. However, since cross-attention mechanisms focus on semantic relevance, they struggle to maintain the image integrity. As a result, these methods often lack spatial consistency, leading to editing artifacts and distortions. In this work, we address these limitations and introduce LOCATEdit, which enhances cross-attention maps through a graph-based approach utilizing self-attention-derived patch relationships to maintain smooth, coherent attention across image regions, ensuring that alterations are limited to the designated items while retaining the surrounding structure. LOCATEdit consistently and substantially outperforms existing baselines on PIE-Bench, demonstrating its state-of-the-art performance and effectiveness on various editing tasks. Code can be found on https://github.com/LOCATEdit/LOCATEdit/

LOCATEdit: Graph Laplacian Optimized Cross Attention for Localized Text-Guided Image Editing

TL;DR

Abstract

LOCATEdit: Graph Laplacian Optimized Cross Attention for Localized Text-Guided Image Editing

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (3)