Table of Contents
Fetching ...

UniEdit-Flow: Unleashing Inversion and Editing in the Era of Flow Models

Guanlong Jiao, Biqing Huang, Kuan-Chieh Wang, Renjie Liao

TL;DR

UniEdit-Flow tackles inversion and editing for flow matching models by introducing Uni-Inv, a high-precision inversion method, and Uni-Edit, a region-aware editing approach. The framework uses a predictor–corrector design combined with region-adaptive guidance and velocity fusion, enabling robust, low-cost, tuning-free operations across flow and diffusion models. Empirical results demonstrate state-of-the-art inversion accuracy and editing quality, with applications ranging from sketch-to-image to video editing. This work advances practical image editing in the era of flow-based generative models by preserving editing-irrelevant regions while delivering strong, controllable edits.

Abstract

Flow matching models have emerged as a strong alternative to diffusion models, but existing inversion and editing methods designed for diffusion are often ineffective or inapplicable to them. The straight-line, non-crossing trajectories of flow models pose challenges for diffusion-based approaches but also open avenues for novel solutions. In this paper, we introduce a predictor-corrector-based framework for inversion and editing in flow models. First, we propose Uni-Inv, an effective inversion method designed for accurate reconstruction. Building on this, we extend the concept of delayed injection to flow models and introduce Uni-Edit, a region-aware, robust image editing approach. Our methodology is tuning-free, model-agnostic, efficient, and effective, enabling diverse edits while ensuring strong preservation of edit-irrelevant regions. Extensive experiments across various generative models demonstrate the superiority and generalizability of Uni-Inv and Uni-Edit, even under low-cost settings. Project page: https://uniedit-flow.github.io/

UniEdit-Flow: Unleashing Inversion and Editing in the Era of Flow Models

TL;DR

UniEdit-Flow tackles inversion and editing for flow matching models by introducing Uni-Inv, a high-precision inversion method, and Uni-Edit, a region-aware editing approach. The framework uses a predictor–corrector design combined with region-adaptive guidance and velocity fusion, enabling robust, low-cost, tuning-free operations across flow and diffusion models. Empirical results demonstrate state-of-the-art inversion accuracy and editing quality, with applications ranging from sketch-to-image to video editing. This work advances practical image editing in the era of flow-based generative models by preserving editing-irrelevant regions while delivering strong, controllable edits.

Abstract

Flow matching models have emerged as a strong alternative to diffusion models, but existing inversion and editing methods designed for diffusion are often ineffective or inapplicable to them. The straight-line, non-crossing trajectories of flow models pose challenges for diffusion-based approaches but also open avenues for novel solutions. In this paper, we introduce a predictor-corrector-based framework for inversion and editing in flow models. First, we propose Uni-Inv, an effective inversion method designed for accurate reconstruction. Building on this, we extend the concept of delayed injection to flow models and introduce Uni-Edit, a region-aware, robust image editing approach. Our methodology is tuning-free, model-agnostic, efficient, and effective, enabling diverse edits while ensuring strong preservation of edit-irrelevant regions. Extensive experiments across various generative models demonstrate the superiority and generalizability of Uni-Inv and Uni-Edit, even under low-cost settings. Project page: https://uniedit-flow.github.io/

Paper Structure

This paper contains 27 sections, 1 theorem, 22 equations, 18 figures, 4 tables, 2 algorithms.

Key Result

Proposition 4.1

Suppose the velocity field $\boldsymbol{v}_\theta$ is Lipschitz, and there is a constant $C$ such that $\left\Vert \boldsymbol{Z}_{t_p} - \boldsymbol{Z}_{t_q} \right\Vert \leq C \left\Vert t_p - t_q \right\Vert, \forall t_p,t_q \in [0, 1]$, where $\boldsymbol{Z}_{t_p}$ and $\boldsymbol{Z}_{t_q}$ com

Figures (18)

  • Figure 1: UniEdit-Flow for image inversion and editing. Our approach proposes a highly accurate and efficient, model-agnostic, training and tuning-free sampling strategy for flow models to tackle image inversion and editing problems. Cluttered scenes are difficult for inversion and reconstruction, leading to failure results on various methods. Our Uni-Inv achieves exact reconstruction even in such complex situations (1st line). Furthermore, existing flow editing always maintain undesirable affects, out region-aware sampling-based Uni-Edit showcases excellent performance for both editing and background preservation (2nd line).
  • Figure 2: Delayed injection, which retains the source condition during the early denoising steps and introduces the edit condition at a middle timestep (illustrated in the bottom part), is a widely used technique in diffusion-based editing (top row). However, when applied to flow models (second row), it is ineffective. While flow-based editing exhibits a mild tendency toward the target edit, it fails to produce sufficiently strong effects.
  • Figure 3: An overview of our proposed Uni-Inv and Uni-Edit (bird $\xrightarrow{}$ red bird). (a) indicates that vanilla flow inversion is incapable for both exact image inversion and controllable editing. (b) demonstrates our proposed Uni-Inv and Uni-Edit, which perform efficient and effective inversion and editing.
  • Figure 4: Per-step error of the velocities and samples of vanilla inversions. We first synthesis an image $\boldsymbol{Z}_0$, then conduct vanilla inversion to get inverted noises $\boldsymbol{Z}_1$ with per-step velocity of $\boldsymbol{v}_\theta(\boldsymbol{\widehat{Z}}_{t_{i-1}}, t_{i-1})$ ($\blacklozenge$) and $\boldsymbol{v}_\theta(\boldsymbol{\widehat{Z}}_{t_{i-1}}, t_{i})$ ($\blacksquare$), respectively. We plot the per-step local error of samples ($\Delta \boldsymbol{Z}$) velocities ($\Delta \boldsymbol{v}$). The right shows the visualization of various $\boldsymbol{Z}_1$, while their border colors correspond to different conditions (black for the initial noise).
  • Figure 5: Demonstration of various sampling-based image editing methods (dog $\xrightarrow{}$ lion). Directly utilizing $\boldsymbol{c}^T$ as condition leads to an undue editing. Leveraging delayed injection, which is widely used in diffusion-based methods, inevitably results in an inchoate performance when using deterministic models. Our Uni-Edit mitigates early steps obtained components that are not conducive to editing, ultimately achieving satisfying results.
  • ...and 13 more figures

Theorems & Definitions (1)

  • Proposition 4.1