Tight Inversion: Image-Conditioned Inversion for Real Image Editing
Edo Kadosh, Nir Goren, Or Patashnik, Daniel Garibi, Daniel Cohen-Or
TL;DR
Real-image editing with diffusion models faces a reconstruction-editability trade-off. The authors introduce Tight Inversion, which conditions both inversion and denoising on the input image, narrowing $p_{\theta}(z_t|c)$ and improving fidelity of $z_T$ reconstruction and subsequent edits. Implemented as a plug-in for existing inversions (e.g., DDIM, ReNoise, RF-Inversion) and compatible with standard, few-step, and flow models, the approach leverages image-conditioned diffusion via IP-Adapter. Extensive experiments show consistent gains in reconstruction metrics and enhanced editability across multiple editing pipelines, underlining the practical impact for robust real-image editing. The work also analyzes conditioning strength and acknowledges a trade-off between reconstruction and editability, highlighting opportunities for future image-conditioning innovations.
Abstract
Text-to-image diffusion models offer powerful image editing capabilities. To edit real images, many methods rely on the inversion of the image into Gaussian noise. A common approach to invert an image is to gradually add noise to the image, where the noise is determined by reversing the sampling equation. This process has an inherent tradeoff between reconstruction and editability, limiting the editing of challenging images such as highly-detailed ones. Recognizing the reliance of text-to-image models inversion on a text condition, this work explores the importance of the condition choice. We show that a condition that precisely aligns with the input image significantly improves the inversion quality. Based on our findings, we introduce Tight Inversion, an inversion method that utilizes the most possible precise condition -- the input image itself. This tight condition narrows the distribution of the model's output and enhances both reconstruction and editability. We demonstrate the effectiveness of our approach when combined with existing inversion methods through extensive experiments, evaluating the reconstruction accuracy as well as the integration with various editing methods.
