Table of Contents
Fetching ...

Free Lunch for Stabilizing Rectified Flow Inversion

Chenru Wang, Beier Zhu, Chi Zhang

TL;DR

This work addresses instability in inversion of Rectified Flow (RF)-based generative models, where approximation errors accumulate across timesteps. It introduces Proximal-Mean Inversion (PMI), a training-free gradient correction that nudges the velocity field toward a running average within a theoretically derived spherical radius to stabilize inversion and reconstruction. Complementarily, mimic-CFG provides a lightweight, CFG-inspired velocity correction for editing by interpolating toward the history-average direction, balancing editing fidelity with structural consistency. Across PIE-Bench, PMI and mimic-CFG achieve state-of-the-art reconstruction and editing performance with fewer neural function evaluations, while remaining training-free and broadly applicable to RF-based models. Together, these methods deliver practical, efficient, and theoretically grounded improvements to RF-based inversion tasks.

Abstract

Rectified-Flow (RF)-based generative models have recently emerged as strong alternatives to traditional diffusion models, demonstrating state-of-the-art performance across various tasks. By learning a continuous velocity field that transforms simple noise into complex data, RF-based models not only enable high-quality generation, but also support training-free inversion, which facilitates downstream tasks such as reconstruction and editing. However, existing inversion methods, such as vanilla RF-based inversion, suffer from approximation errors that accumulate across timesteps, leading to unstable velocity fields and degraded reconstruction and editing quality. To address this challenge, we propose Proximal-Mean Inversion (PMI), a training-free gradient correction method that stabilizes the velocity field by guiding it toward a running average of past velocities, constrained within a theoretically derived spherical Gaussian. Furthermore, we introduce mimic-CFG, a lightweight velocity correction scheme for editing tasks, which interpolates between the current velocity and its projection onto the historical average, balancing editing effectiveness and structural consistency. Extensive experiments on PIE-Bench demonstrate that our methods significantly improve inversion stability, image reconstruction quality, and editing fidelity, while reducing the required number of neural function evaluations. Our approach achieves state-of-the-art performance on the PIE-Bench with enhanced efficiency and theoretical soundness.

Free Lunch for Stabilizing Rectified Flow Inversion

TL;DR

This work addresses instability in inversion of Rectified Flow (RF)-based generative models, where approximation errors accumulate across timesteps. It introduces Proximal-Mean Inversion (PMI), a training-free gradient correction that nudges the velocity field toward a running average within a theoretically derived spherical radius to stabilize inversion and reconstruction. Complementarily, mimic-CFG provides a lightweight, CFG-inspired velocity correction for editing by interpolating toward the history-average direction, balancing editing fidelity with structural consistency. Across PIE-Bench, PMI and mimic-CFG achieve state-of-the-art reconstruction and editing performance with fewer neural function evaluations, while remaining training-free and broadly applicable to RF-based models. Together, these methods deliver practical, efficient, and theoretically grounded improvements to RF-based inversion tasks.

Abstract

Rectified-Flow (RF)-based generative models have recently emerged as strong alternatives to traditional diffusion models, demonstrating state-of-the-art performance across various tasks. By learning a continuous velocity field that transforms simple noise into complex data, RF-based models not only enable high-quality generation, but also support training-free inversion, which facilitates downstream tasks such as reconstruction and editing. However, existing inversion methods, such as vanilla RF-based inversion, suffer from approximation errors that accumulate across timesteps, leading to unstable velocity fields and degraded reconstruction and editing quality. To address this challenge, we propose Proximal-Mean Inversion (PMI), a training-free gradient correction method that stabilizes the velocity field by guiding it toward a running average of past velocities, constrained within a theoretically derived spherical Gaussian. Furthermore, we introduce mimic-CFG, a lightweight velocity correction scheme for editing tasks, which interpolates between the current velocity and its projection onto the historical average, balancing editing effectiveness and structural consistency. Extensive experiments on PIE-Bench demonstrate that our methods significantly improve inversion stability, image reconstruction quality, and editing fidelity, while reducing the required number of neural function evaluations. Our approach achieves state-of-the-art performance on the PIE-Bench with enhanced efficiency and theoretical soundness.
Paper Structure (50 sections, 3 theorems, 60 equations, 16 figures, 11 tables, 8 algorithms)

This paper contains 50 sections, 3 theorems, 60 equations, 16 figures, 11 tables, 8 algorithms.

Key Result

Proposition 1

(Stability Condition) Suppose the inverted latent vector $\hat{\mathbf{z}}_1 \in \mathbb{R}^n$ follows a Gaussian distribution and define $r = \sqrt{n}\,\sigma,$ where $n$ is the latent dimension and $\sigma>0$ is the scaling factor. The radius that can make the sampling points fall within the high-density region satisfies where $T$ is the total time, $\Delta t_i = t_i-t_{i-1}$, $\epsilon>0$ and

Figures (16)

  • Figure 1: Qualitative comparison results for the inversion and reconstruction task on PIE-Bench. Baselines enhanced with our method consistently outperform their vanilla counterparts, particularly under the unconditional setting.
  • Figure 2: Qualitative comparison results for the editing task on PIE-Bench. The leftmost column shows the source images. Each of the three columns on the right contains two sets of results: the left side shows outputs from the vanilla baseline, while the right side shows outputs enhanced with our PMI and mimic-CFG methods. The results show that our methods can enhance the background preservation and editing quality in different editing categories.
  • Figure 3: Qualitative comparison of editing results on PIE-Bench (first three) and real-world (last) images. Our method achieves superior adaptation in both synthetic and real-world editing scenarios.
  • Figure 4: Analysis experiment of the parameter $w$ in mimic-CFG. PSNR and SSIM improve with moderate correction, while over correction (small $w$) harms both background preservation and editing quality.
  • Figure B.1: Analysis experiment on the effect of the proximal operator parameter $\lambda$. Increasing $\lambda$ leads to improvements in PSNR and SSIM, while excessively large values result in overcorrection, compromising editing quality.
  • ...and 11 more figures

Theorems & Definitions (5)

  • Proposition 1
  • Proposition 1
  • proof
  • Proposition 2
  • proof