Noise Map Guidance: Inversion with Spatial Context for Real Image Editing
Hansam Cho, Jonghyun Lee, Seoung Bum Kim, Tae-Hyun Oh, Yonghyun Jeong
TL;DR
Noise Map Guidance (NMG) tackles real-image editing with text-guided diffusion models by introducing an optimization-free inversion that leverages spatial context from DDIM noise maps. By conditioning the reverse process on both noise maps and text embeddings through energy guidance and gradient scaling, NMG preserves input spatial structure while enabling faithful edits. It seamlessly integrates with diverse editing techniques (e.g., Prompt-to-Prompt, MasaCtrl, pix2pix-zero) and remains robust across DDIM inversion variants, delivering faster reconstruction than NTI without compromising quality. The results show improved local and global edits, strong quantitative metrics, and favorable human judgments, highlighting NMG’s practical impact for reliable, high-fidelity real-image editing in diffusion-based frameworks.
Abstract
Text-guided diffusion models have become a popular tool in image synthesis, known for producing high-quality and diverse images. However, their application to editing real images often encounters hurdles primarily due to the text condition deteriorating the reconstruction quality and subsequently affecting editing fidelity. Null-text Inversion (NTI) has made strides in this area, but it fails to capture spatial context and requires computationally intensive per-timestep optimization. Addressing these challenges, we present Noise Map Guidance (NMG), an inversion method rich in a spatial context, tailored for real-image editing. Significantly, NMG achieves this without necessitating optimization, yet preserves the editing quality. Our empirical investigations highlight NMG's adaptability across various editing techniques and its robustness to variants of DDIM inversions.
