Table of Contents
Fetching ...

InverseMeetInsert: Robust Real Image Editing via Geometric Accumulation Inversion in Guided Diffusion Models

Yan Zheng, Lemeng Wu

TL;DR

GEO is introduced, an exceptionally versatile image editing technique designed to cater to customized user requirements at both local and global scales and is driven by a novel geometric accumulation loss that enhances DDIM inversion to faithfully preserve pixel space geometry and layout.

Abstract

In this paper, we introduce Geometry-Inverse-Meet-Pixel-Insert, short for GEO, an exceptionally versatile image editing technique designed to cater to customized user requirements at both local and global scales. Our approach seamlessly integrates text prompts and image prompts to yield diverse and precise editing outcomes. Notably, our method operates without the need for training and is driven by two key contributions: (i) a novel geometric accumulation loss that enhances DDIM inversion to faithfully preserve pixel space geometry and layout, and (ii) an innovative boosted image prompt technique that combines pixel-level editing for text-only inversion with latent space geometry guidance for standard classifier-free reversion. Leveraging the publicly available Stable Diffusion model, our approach undergoes extensive evaluation across various image types and challenging prompt editing scenarios, consistently delivering high-fidelity editing results for real images.

InverseMeetInsert: Robust Real Image Editing via Geometric Accumulation Inversion in Guided Diffusion Models

TL;DR

GEO is introduced, an exceptionally versatile image editing technique designed to cater to customized user requirements at both local and global scales and is driven by a novel geometric accumulation loss that enhances DDIM inversion to faithfully preserve pixel space geometry and layout.

Abstract

In this paper, we introduce Geometry-Inverse-Meet-Pixel-Insert, short for GEO, an exceptionally versatile image editing technique designed to cater to customized user requirements at both local and global scales. Our approach seamlessly integrates text prompts and image prompts to yield diverse and precise editing outcomes. Notably, our method operates without the need for training and is driven by two key contributions: (i) a novel geometric accumulation loss that enhances DDIM inversion to faithfully preserve pixel space geometry and layout, and (ii) an innovative boosted image prompt technique that combines pixel-level editing for text-only inversion with latent space geometry guidance for standard classifier-free reversion. Leveraging the publicly available Stable Diffusion model, our approach undergoes extensive evaluation across various image types and challenging prompt editing scenarios, consistently delivering high-fidelity editing results for real images.
Paper Structure (19 sections, 9 equations, 6 figures, 2 algorithms)

This paper contains 19 sections, 9 equations, 6 figures, 2 algorithms.

Figures (6)

  • Figure 1: Pipeline of GEO. We first take a pixel level causal edit by user and a text edit prompt as input. The DDIM inversion process revert the pixel edited image back to latent, during this process, we apply Geometric accumulative loss to retain the latent information from both the pixel edit space and the text prompt guidance, compared with DDIM inversion, we get a better decoding image during the inversion process and as the result we get a fine-detailed result compared with Naive DDIM inversion.
  • Figure 2: Editing examples of various of input images with different styles of Image prompt. We take image prompt from SDEdit, stickers, user stroke and brush. Combine with the text prompt input, GEO can refine the image prompt into high-fidelity image result that keeps the requirement from both image and text input.
  • Figure 3: Multi-area editing result from different methods. Our method can accurately capture the multiarea edit requirement from image and text prompt while other methods tend to merge the different areas' concept together.
  • Figure 4: Custom editing result. For a same image and a same text prompt, we can easily change the style according to the input image prompt. This allows user to customize their own style with brushing, stroking, or even some sticker pasted from other image.
  • Figure 5: Ablation study on detail preserving ability between DDIO inversion and GEO. DDIM Inversion usually tends to distort the background during edtigin. Our GEO can keep edit the prompt area accurately while keeping the background area unchanged.
  • ...and 1 more figures