Table of Contents
Fetching ...

Guide-and-Rescale: Self-Guidance Mechanism for Effective Tuning-Free Real Image Editing

Vadim Titov, Madina Khalmatova, Alexandra Ivanova, Dmitry Vetrov, Aibek Alanov

TL;DR

Real image editing with diffusion models remains challenging due to the need to realize target edits while preserving the original content. Guide-and-Rescale introduces a tunable-free self-guidance framework that augments classifier-free guidance with a Self-attention Guider and a Feature Guider, plus a noise-rescaling term to stabilize the editing trajectory. Editing proceeds via a DDIM inversion to obtain $z_T$ from the source image using $y_{\mathrm{src}}$, followed by guided denoising conditioned on $y_{\mathrm{trg}}$ with gradient terms $\nabla_{z_t} g$. The approach yields superior CLIP and LPIPS trade-offs and competitive FID across multiple editing tasks without fine-tuning the diffusion model, and code is publicly available at the provided repository.

Abstract

Despite recent advances in large-scale text-to-image generative models, manipulating real images with these models remains a challenging problem. The main limitations of existing editing methods are that they either fail to perform with consistent quality on a wide range of image edits or require time-consuming hyperparameter tuning or fine-tuning of the diffusion model to preserve the image-specific appearance of the input image. We propose a novel approach that is built upon a modified diffusion sampling process via the guidance mechanism. In this work, we explore the self-guidance technique to preserve the overall structure of the input image and its local regions appearance that should not be edited. In particular, we explicitly introduce layout-preserving energy functions that are aimed to save local and global structures of the source image. Additionally, we propose a noise rescaling mechanism that allows to preserve noise distribution by balancing the norms of classifier-free guidance and our proposed guiders during generation. Such a guiding approach does not require fine-tuning the diffusion model and exact inversion process. As a result, the proposed method provides a fast and high-quality editing mechanism. In our experiments, we show through human evaluation and quantitative analysis that the proposed method allows to produce desired editing which is more preferable by humans and also achieves a better trade-off between editing quality and preservation of the original image. Our code is available at https://github.com/MACderRu/Guide-and-Rescale.

Guide-and-Rescale: Self-Guidance Mechanism for Effective Tuning-Free Real Image Editing

TL;DR

Real image editing with diffusion models remains challenging due to the need to realize target edits while preserving the original content. Guide-and-Rescale introduces a tunable-free self-guidance framework that augments classifier-free guidance with a Self-attention Guider and a Feature Guider, plus a noise-rescaling term to stabilize the editing trajectory. Editing proceeds via a DDIM inversion to obtain from the source image using , followed by guided denoising conditioned on with gradient terms . The approach yields superior CLIP and LPIPS trade-offs and competitive FID across multiple editing tasks without fine-tuning the diffusion model, and code is publicly available at the provided repository.

Abstract

Despite recent advances in large-scale text-to-image generative models, manipulating real images with these models remains a challenging problem. The main limitations of existing editing methods are that they either fail to perform with consistent quality on a wide range of image edits or require time-consuming hyperparameter tuning or fine-tuning of the diffusion model to preserve the image-specific appearance of the input image. We propose a novel approach that is built upon a modified diffusion sampling process via the guidance mechanism. In this work, we explore the self-guidance technique to preserve the overall structure of the input image and its local regions appearance that should not be edited. In particular, we explicitly introduce layout-preserving energy functions that are aimed to save local and global structures of the source image. Additionally, we propose a noise rescaling mechanism that allows to preserve noise distribution by balancing the norms of classifier-free guidance and our proposed guiders during generation. Such a guiding approach does not require fine-tuning the diffusion model and exact inversion process. As a result, the proposed method provides a fast and high-quality editing mechanism. In our experiments, we show through human evaluation and quantitative analysis that the proposed method allows to produce desired editing which is more preferable by humans and also achieves a better trade-off between editing quality and preservation of the original image. Our code is available at https://github.com/MACderRu/Guide-and-Rescale.
Paper Structure (26 sections, 24 equations, 22 figures, 10 tables, 1 algorithm)

This paper contains 26 sections, 24 equations, 22 figures, 10 tables, 1 algorithm.

Figures (22)

  • Figure 1: Guide-and-Rescale for real image editing. Our method allows to manipulate images for a wide range of different editings. It achieves a good balance between quality of manipulation and preservation of the original image.
  • Figure 2: Overall scheme of the proposed method Guide-and-Rescale. First, our method uses a classic ddim inversion of the source real image. Then the method performs real image editing via the classical denoising process. For every denoising step the noise term is modified by guider that utilizes latents $z_t$ from the current generation process and time-aligned ddim latents $z^*_t$.
  • Figure 3: Editing example. From left to right, first: initial image; second: naive editing, described in Equation \ref{['eq:sampling_naive']}; third: editing with simple energy function $g$ from Equation \ref{['eq:simple_g']}; fourth: editing with the proposed method.
  • Figure 4: (a) Effect of the proposed guiders (Equation \ref{['eq:self_attn']}, Equation \ref{['eq:features_other']}). Jointly applying both guiders preserves both layout and visual characteristics of unedited regions of the image. (b) Illustration of applying noise rescaling (Equation \ref{['eq:noise_resc']}). This technique aligns the sum of guiders with CFG according to the coefficient, defined in Equation \ref{['eq:scaling_factor_def']}, therefore stabilizes editing and improves its quality.
  • Figure 5: Visual comparison of our method with baselines over different types of editing. Our approach shows more consistent results than existing methods and achieves a better trade-off between editing quality and preservation of the structure of the original image.
  • ...and 17 more figures