Table of Contents
Fetching ...

Reversible Inversion for Training-Free Exemplar-guided Image Editing

Yuke Li, Lianli Gao, Ji Zhang, Pengpeng Zeng, Lichuan Xiang, Hongkai Wen, Heng Tao Shen, Jingkuan Song

TL;DR

This work tackles the high cost of training-based exemplar-guided image editing by introducing Reversible Inversion (ReInversion), a training-free framework that combines a forward Reconstruction-Based Inversion with a two-stage denoising process. The first stage preserves source structure, while the second stage injects reference attributes, with an optional Mask-Guided Selective Denoising (MSD) to constrain edits to target regions. Through extensive experiments on COCOEE, ReInversion achieves state-of-the-art quality, foreground-consistency, and background preservation while substantially reducing computation (NFEs) and latency, including a faster deterministic variant. The results demonstrate that training-free inversion-based EIE can match or surpass pretrained approaches in fidelity and efficiency, offering practical benefits for real-time and resource-constrained editing tasks.

Abstract

Exemplar-guided Image Editing (EIE) aims to modify a source image according to a visual reference. Existing approaches often require large-scale pre-training to learn relationships between the source and reference images, incurring high computational costs. As a training-free alternative, inversion techniques can be used to map the source image into a latent space for manipulation. However, our empirical study reveals that standard inversion is sub-optimal for EIE, leading to poor quality and inefficiency. To tackle this challenge, we introduce \textbf{Reversible Inversion ({ReInversion})} for effective and efficient EIE. Specifically, ReInversion operates as a two-stage denoising process, which is first conditioned on the source image and subsequently on the reference. Besides, we introduce a Mask-Guided Selective Denoising (MSD) strategy to constrain edits to target regions, preserving the structural consistency of the background. Both qualitative and quantitative comparisons demonstrate that our ReInversion method achieves state-of-the-art EIE performance with the lowest computational overhead.

Reversible Inversion for Training-Free Exemplar-guided Image Editing

TL;DR

This work tackles the high cost of training-based exemplar-guided image editing by introducing Reversible Inversion (ReInversion), a training-free framework that combines a forward Reconstruction-Based Inversion with a two-stage denoising process. The first stage preserves source structure, while the second stage injects reference attributes, with an optional Mask-Guided Selective Denoising (MSD) to constrain edits to target regions. Through extensive experiments on COCOEE, ReInversion achieves state-of-the-art quality, foreground-consistency, and background preservation while substantially reducing computation (NFEs) and latency, including a faster deterministic variant. The results demonstrate that training-free inversion-based EIE can match or surpass pretrained approaches in fidelity and efficiency, offering practical benefits for real-time and resource-constrained editing tasks.

Abstract

Exemplar-guided Image Editing (EIE) aims to modify a source image according to a visual reference. Existing approaches often require large-scale pre-training to learn relationships between the source and reference images, incurring high computational costs. As a training-free alternative, inversion techniques can be used to map the source image into a latent space for manipulation. However, our empirical study reveals that standard inversion is sub-optimal for EIE, leading to poor quality and inefficiency. To tackle this challenge, we introduce \textbf{Reversible Inversion ({ReInversion})} for effective and efficient EIE. Specifically, ReInversion operates as a two-stage denoising process, which is first conditioned on the source image and subsequently on the reference. Besides, we introduce a Mask-Guided Selective Denoising (MSD) strategy to constrain edits to target regions, preserving the structural consistency of the background. Both qualitative and quantitative comparisons demonstrate that our ReInversion method achieves state-of-the-art EIE performance with the lowest computational overhead.

Paper Structure

This paper contains 20 sections, 15 equations, 11 figures, 5 tables.

Figures (11)

  • Figure 1: Exemplar-guided image editing results of our proposed training-free method ReInversion. The left column shows the source images, the three right columns show the generated results, each conditioned on a reference exemplar (denoted by a red border).
  • Figure 2: Comparison of inversion-based editing methods and our ReInversion. (a) Conventional inversion maps an source image to its approximate start noise and generates by a text guidance. (b) Inversion for EIE naively replaces the text condition with an exemplar, which leads to noise drift and undesirable edit result. (c) Reconstruction-Based Inversion (Recon-Inv) leverages the model’s forward reconstruction velocities to obtain a drift-free, reliable inversion from the source image. (d) Our ReInversion reformulates Recon-Inv into a $1\times$NFEs (Number of Function Evaluations) process and achieves faithful EIE.
  • Figure 3: Reconstruction results of Flux-Kontext Kontext. Top: source images; bottom: reconstructions. The average per-pixel $L_2$ error is 0.016 on a [0,1] scale.
  • Figure 4: Qualitative comparisons with SOTA inversion-based methods. Our ReInversion (the last column) demonstrates superior performance compared to existing flow-based inversion methods.
  • Figure 5: Ablation study on the effect of the hyperparameter $t_\tau$.
  • ...and 6 more figures