Table of Contents
Fetching ...

Tight Inversion: Image-Conditioned Inversion for Real Image Editing

Edo Kadosh, Nir Goren, Or Patashnik, Daniel Garibi, Daniel Cohen-Or

TL;DR

Real-image editing with diffusion models faces a reconstruction-editability trade-off. The authors introduce Tight Inversion, which conditions both inversion and denoising on the input image, narrowing $p_{\theta}(z_t|c)$ and improving fidelity of $z_T$ reconstruction and subsequent edits. Implemented as a plug-in for existing inversions (e.g., DDIM, ReNoise, RF-Inversion) and compatible with standard, few-step, and flow models, the approach leverages image-conditioned diffusion via IP-Adapter. Extensive experiments show consistent gains in reconstruction metrics and enhanced editability across multiple editing pipelines, underlining the practical impact for robust real-image editing. The work also analyzes conditioning strength and acknowledges a trade-off between reconstruction and editability, highlighting opportunities for future image-conditioning innovations.

Abstract

Text-to-image diffusion models offer powerful image editing capabilities. To edit real images, many methods rely on the inversion of the image into Gaussian noise. A common approach to invert an image is to gradually add noise to the image, where the noise is determined by reversing the sampling equation. This process has an inherent tradeoff between reconstruction and editability, limiting the editing of challenging images such as highly-detailed ones. Recognizing the reliance of text-to-image models inversion on a text condition, this work explores the importance of the condition choice. We show that a condition that precisely aligns with the input image significantly improves the inversion quality. Based on our findings, we introduce Tight Inversion, an inversion method that utilizes the most possible precise condition -- the input image itself. This tight condition narrows the distribution of the model's output and enhances both reconstruction and editability. We demonstrate the effectiveness of our approach when combined with existing inversion methods through extensive experiments, evaluating the reconstruction accuracy as well as the integration with various editing methods.

Tight Inversion: Image-Conditioned Inversion for Real Image Editing

TL;DR

Real-image editing with diffusion models faces a reconstruction-editability trade-off. The authors introduce Tight Inversion, which conditions both inversion and denoising on the input image, narrowing and improving fidelity of reconstruction and subsequent edits. Implemented as a plug-in for existing inversions (e.g., DDIM, ReNoise, RF-Inversion) and compatible with standard, few-step, and flow models, the approach leverages image-conditioned diffusion via IP-Adapter. Extensive experiments show consistent gains in reconstruction metrics and enhanced editability across multiple editing pipelines, underlining the practical impact for robust real-image editing. The work also analyzes conditioning strength and acknowledges a trade-off between reconstruction and editability, highlighting opportunities for future image-conditioning innovations.

Abstract

Text-to-image diffusion models offer powerful image editing capabilities. To edit real images, many methods rely on the inversion of the image into Gaussian noise. A common approach to invert an image is to gradually add noise to the image, where the noise is determined by reversing the sampling equation. This process has an inherent tradeoff between reconstruction and editability, limiting the editing of challenging images such as highly-detailed ones. Recognizing the reliance of text-to-image models inversion on a text condition, this work explores the importance of the condition choice. We show that a condition that precisely aligns with the input image significantly improves the inversion quality. Based on our findings, we introduce Tight Inversion, an inversion method that utilizes the most possible precise condition -- the input image itself. This tight condition narrows the distribution of the model's output and enhances both reconstruction and editability. We demonstrate the effectiveness of our approach when combined with existing inversion methods through extensive experiments, evaluating the reconstruction accuracy as well as the integration with various editing methods.

Paper Structure

This paper contains 18 sections, 2 equations, 13 figures, 2 tables.

Figures (13)

  • Figure 1: Our Tight Inversion method facilitates the editing of highly-detailed challenging real images across different models.
  • Figure 2: Each row presents a real, highly detailed image followed by reconstruction results using progressively more precise conditions during inversion and denoising. As shown, increasing the precision of the condition enhances reconstruction accuracy. In the rightmost column, we use the ultimate condition -- the input image itself -- resulting in the highest reconstruction fidelity. In all presented results, no CFG was applied during either the inversion or denoising processes.
  • Figure 3: We train a toy conditional CNF model to analyze the importance of the condition used during inversion. The prior distribution is a single Gaussian, and the posterior consists of five Gaussians. (a) shows denoising trajectories from the prior, and (b)-(d) show inversion and denoising trajectories for points from the posterior. In (b), a null condition is used for both processes, in (c), the condition matches the Gaussian from which the point was sampled, and in (d), the condition corresponds to the adjacent Gaussian. Lines connect points on the inversion and denoising trajectories to illustrate offsets between these processes.
  • Figure 4: Using a descriptive condition in DDIM inversion results in improved reconstruction. As shown, image conditioning outperforms text conditioning. The benefit of our method is particularly evident in challenging images with intricate details.
  • Figure 5: Qualitative reconstruction results with SDXL. Integrating Tight Inversion with various inversion methods enhances reconstruction. Observe the reflection on the window in the second column.
  • ...and 8 more figures