Table of Contents
Fetching ...

LEDITS: Real Image Editing with DDPM Inversion and Semantic Guidance

Linoy Tsaban, Apolinário Passos

TL;DR

This work addresses the challenge of editing real images with diffusion models by integrating two powerful approaches: DDPM inversion for faithful real-image reconstruction and Semantic Guidance (SEGA) for fine-grained semantic control. The authors propose LEDITS, a lightweight method that extends SEGA to inverted real images and combines it with DDPM inversion, enabling versatile edits from subtle to substantial while preserving fidelity. Through qualitative experiments, LEDITS demonstrates competitive results with state-of-the-art methods, offering flexible control by jointly leveraging inversion and semantic guidance without architectural changes. The approach enhances practical real-image editing by delivering diverse, semantically coherent edits with maintained image fidelity and without heavy computational overhead.

Abstract

Recent large-scale text-guided diffusion models provide powerful image-generation capabilities. Currently, a significant effort is given to enable the modification of these images using text only as means to offer intuitive and versatile editing. However, editing proves to be difficult for these generative models due to the inherent nature of editing techniques, which involves preserving certain content from the original image. Conversely, in text-based models, even minor modifications to the text prompt frequently result in an entirely distinct result, making attaining one-shot generation that accurately corresponds to the users intent exceedingly challenging. In addition, to edit a real image using these state-of-the-art tools, one must first invert the image into the pre-trained models domain - adding another factor affecting the edit quality, as well as latency. In this exploratory report, we propose LEDITS - a combined lightweight approach for real-image editing, incorporating the Edit Friendly DDPM inversion technique with Semantic Guidance, thus extending Semantic Guidance to real image editing, while harnessing the editing capabilities of DDPM inversion as well. This approach achieves versatile edits, both subtle and extensive as well as alterations in composition and style, while requiring no optimization nor extensions to the architecture.

LEDITS: Real Image Editing with DDPM Inversion and Semantic Guidance

TL;DR

This work addresses the challenge of editing real images with diffusion models by integrating two powerful approaches: DDPM inversion for faithful real-image reconstruction and Semantic Guidance (SEGA) for fine-grained semantic control. The authors propose LEDITS, a lightweight method that extends SEGA to inverted real images and combines it with DDPM inversion, enabling versatile edits from subtle to substantial while preserving fidelity. Through qualitative experiments, LEDITS demonstrates competitive results with state-of-the-art methods, offering flexible control by jointly leveraging inversion and semantic guidance without architectural changes. The approach enhances practical real-image editing by delivering diverse, semantically coherent edits with maintained image fidelity and without heavy computational overhead.

Abstract

Recent large-scale text-guided diffusion models provide powerful image-generation capabilities. Currently, a significant effort is given to enable the modification of these images using text only as means to offer intuitive and versatile editing. However, editing proves to be difficult for these generative models due to the inherent nature of editing techniques, which involves preserving certain content from the original image. Conversely, in text-based models, even minor modifications to the text prompt frequently result in an entirely distinct result, making attaining one-shot generation that accurately corresponds to the users intent exceedingly challenging. In addition, to edit a real image using these state-of-the-art tools, one must first invert the image into the pre-trained models domain - adding another factor affecting the edit quality, as well as latency. In this exploratory report, we propose LEDITS - a combined lightweight approach for real-image editing, incorporating the Edit Friendly DDPM inversion technique with Semantic Guidance, thus extending Semantic Guidance to real image editing, while harnessing the editing capabilities of DDPM inversion as well. This approach achieves versatile edits, both subtle and extensive as well as alterations in composition and style, while requiring no optimization nor extensions to the architecture.
Paper Structure (11 sections, 3 equations, 5 figures, 1 table, 1 algorithm)

This paper contains 11 sections, 3 equations, 5 figures, 1 table, 1 algorithm.

Figures (5)

  • Figure 1: LEDITS- DDPM inversion with semantic guidance for real image editing. Real images edited purely with DDPM inversion and with both DDPM inversion and semantic guidance (LEDITS). In this combined approach we first apply DDPM Inversion on the input image, and then edit by performing the reverse diffusion process using the inverted latents and the desired target prompt, together with semantic guidance.
  • Figure 2: LEDITS overview. Top: inversion of the input image. We first apply DDPM inversion on the original image to obtain the inverted latents and corresponding noise maps. Bottom: We use the inverted latents to drive the reverse diffusion process with semantic guidance. In each denoising step we compute the noise estimate according to the SEGA logic and compute the updated latents according to the DDPM scheme, using pre-computed noise maps.
  • Figure 3: Image editing with LEDITS. LEDITS extends fine-grained control over edit operations and introduces flexibility and versatility. We show images edited purely with DDPM Inversion (forth column from the right) and images edited with LEDITS, using both methods simultaneously (three leftmost and rightmost columns) - these images were edited by using the described target prompt (in black) in addition to SEGA concepts (stated in blue). SEGA semantic vectors maintain their monotonically scaling property when used in LEDITS - the gradual effect of increasing/decreasing the strength of SEGA concepts can be observed from the third column on the right to the rightmost column, and from the third column to the left to the leftmost column.
  • Figure 4: Comparisons. We show results for editing real images using pure DDPM inversions, DDPM inversion with prompt-to-prompt and LEDITS respectively. Results shown here were obtained with the first editing workflow, using DDPM purely for inversion and SEGA for editing. All images were generated using the same seed.
  • Figure 5: Parameter effect in DDPM inversion vs. LEDITS. We show the effect of the parameters skip steps and target guidance scale on the output image when using pure DDPM inversion (top panel) compared to the effect of the edit concepts guidance scales when using LEDITS.