Table of Contents
Fetching ...

ReNoise: Real Image Inversion Through Iterative Noising

Daniel Garibi, Or Patashnik, Andrey Voynov, Hadar Averbuch-Elor, Daniel Cohen-Or

TL;DR

ReNoise addresses the challenge of faithfully inverting real images into diffusion models to enable text-guided editing, particularly for few-step and time-distilled models. By embedding a fixed-point iterative renoising procedure at each inversion step and averaging multiple renoised predictions, ReNoise achieves higher reconstruction quality without increasing the overall operation count. The method is augmented with editability-enforcing losses and noise-correction strategies to preserve editability while maintaining fidelity. Extensive experiments across SD, SDXL variants, and LCM LoRA demonstrate improved reconstruction and faster edit workflows, with robust performance across deterministic and non-deterministic samplers. Overall, ReNoise functions as a versatile meta-algorithm for diffusion-inversion that enhances both accuracy and editability in real-image editing scenarios.

Abstract

Recent advancements in text-guided diffusion models have unlocked powerful image manipulation capabilities. However, applying these methods to real images necessitates the inversion of the images into the domain of the pretrained diffusion model. Achieving faithful inversion remains a challenge, particularly for more recent models trained to generate images with a small number of denoising steps. In this work, we introduce an inversion method with a high quality-to-operation ratio, enhancing reconstruction accuracy without increasing the number of operations. Building on reversing the diffusion sampling process, our method employs an iterative renoising mechanism at each inversion sampling step. This mechanism refines the approximation of a predicted point along the forward diffusion trajectory, by iteratively applying the pretrained diffusion model, and averaging these predictions. We evaluate the performance of our ReNoise technique using various sampling algorithms and models, including recent accelerated diffusion models. Through comprehensive evaluations and comparisons, we show its effectiveness in terms of both accuracy and speed. Furthermore, we confirm that our method preserves editability by demonstrating text-driven image editing on real images.

ReNoise: Real Image Inversion Through Iterative Noising

TL;DR

ReNoise addresses the challenge of faithfully inverting real images into diffusion models to enable text-guided editing, particularly for few-step and time-distilled models. By embedding a fixed-point iterative renoising procedure at each inversion step and averaging multiple renoised predictions, ReNoise achieves higher reconstruction quality without increasing the overall operation count. The method is augmented with editability-enforcing losses and noise-correction strategies to preserve editability while maintaining fidelity. Extensive experiments across SD, SDXL variants, and LCM LoRA demonstrate improved reconstruction and faster edit workflows, with robust performance across deterministic and non-deterministic samplers. Overall, ReNoise functions as a versatile meta-algorithm for diffusion-inversion that enhances both accuracy and editability in real-image editing scenarios.

Abstract

Recent advancements in text-guided diffusion models have unlocked powerful image manipulation capabilities. However, applying these methods to real images necessitates the inversion of the images into the domain of the pretrained diffusion model. Achieving faithful inversion remains a challenge, particularly for more recent models trained to generate images with a small number of denoising steps. In this work, we introduce an inversion method with a high quality-to-operation ratio, enhancing reconstruction accuracy without increasing the number of operations. Building on reversing the diffusion sampling process, our method employs an iterative renoising mechanism at each inversion sampling step. This mechanism refines the approximation of a predicted point along the forward diffusion trajectory, by iteratively applying the pretrained diffusion model, and averaging these predictions. We evaluate the performance of our ReNoise technique using various sampling algorithms and models, including recent accelerated diffusion models. Through comprehensive evaluations and comparisons, we show its effectiveness in terms of both accuracy and speed. Furthermore, we confirm that our method preserves editability by demonstrating text-driven image editing on real images.
Paper Structure (39 sections, 14 equations, 20 figures, 3 tables, 1 algorithm)

This paper contains 39 sections, 14 equations, 20 figures, 3 tables, 1 algorithm.

Figures (20)

  • Figure 1: Our ReNoise inversion technique can be applied to various diffusion models, including recent few-step ones. This figure illustrates the performance of our method with SDXL Turbo and LCM models, showing its effectiveness compared to DDIM inversion. Additionally, we demonstrate that the quality of our inversions allows prompt-driven editing. As illustrated on the right, our approach also allows for prompt-driven image edits.
  • Figure 2: The diffusion process samples a Gaussian noise and iteratively denoises it until reaching the data distribution. At each point along the denoising trajectory, the model predicts a direction, $\epsilon_\theta(z_t)$, to step to the next point along the trajectory. To invert a given image from the distribution, the direction from $z_t$ to $z_{t+1}$ is approximated with the inverse of the direction from $z_t$ to $z_{t-1}$ denoted by a dotted blue line.
  • Figure 3: Comparing reconstruction results of plain DDIM inversion (middle column) on SDXL to DDIM inversion with one ReNoise iteration (rightmost column).
  • Figure 4: Method overview. Given an image $z_0$, we iteratively compute $z_1, ..., z_T$, where each $z_t$ is calculated from $z_{t-1}$. At each time step, we apply the UNet ($\epsilon_\theta$) $\mathcal{K}+1$ times, each using a better approximation of $z_t$ as the input. The initial approximation is $z_{t-1}$. The next one, $z_t^{(1)}$, is the result of the reversed sampler step (i.e., DDIM). The reversed step begins at $z_{t-1}$ and follows the direction of $\epsilon_\theta(z_{t-1}, t)$. At the $k$ renoising iteration, $z_t^{(k)}$ is the input to the UNet, and we obtain a better $z_t$ approximation. For the lasts iterations, we optimize $\epsilon_\theta(z_{t}^{(k)}, t)$ to increase editability. As the final denoising direction, we use the average of the UNet predictions of the last few iterations.
  • Figure 5: Geometric intuition for ReNoise. At each inversion step, we are trying to estimate $z_t$ (marked with a red star) based on $z_{t-1}$. The straightforward approach is to use the reverse direction of the denoising step from $z_{t-1}$, assuming the trajectory is approximately linear. However, this assumption is inaccurate, especially in few-step models, where the size of the steps is not small. We use the linearity assumption only as an initial estimation and keep improving the estimation. We recalculate the denoising step from the previous estimation (which is closer to $z_t$) and then proceed with its opposite direction from $z_{t-1}$ (see the orange vectors).
  • ...and 15 more figures