Table of Contents
Fetching ...

Oscillation Inversion: Understand the structure of Large Flow Model through the Lens of Inversion Method

Yan Zheng, Zhenxiao Liang, Xiaoyan Cong, Lanqing guo, Yuehao Wang, Peihao Wang, Zhangyang Wang

TL;DR

A simple and fast distribution transfer technique that facilitates image enhancement, stroke-based recoloring, as well as visual prompt-guided image editing is introduced, and quantitative results demonstrating the effectiveness of this method for tasks such as image enhancement, makeup transfer, reconstruction quality, and guided sampling quality are provided.

Abstract

We explore the oscillatory behavior observed in inversion methods applied to large-scale text-to-image diffusion models, with a focus on the "Flux" model. By employing a fixed-point-inspired iterative approach to invert real-world images, we observe that the solution does not achieve convergence, instead oscillating between distinct clusters. Through both toy experiments and real-world diffusion models, we demonstrate that these oscillating clusters exhibit notable semantic coherence. We offer theoretical insights, showing that this behavior arises from oscillatory dynamics in rectified flow models. Building on this understanding, we introduce a simple and fast distribution transfer technique that facilitates image enhancement, stroke-based recoloring, as well as visual prompt-guided image editing. Furthermore, we provide quantitative results demonstrating the effectiveness of our method for tasks such as image enhancement, makeup transfer, reconstruction quality, and guided sampling quality. Higher-quality examples of videos and images are available at \href{https://yanyanzheng96.github.io/oscillation_inversion/}{this link}.

Oscillation Inversion: Understand the structure of Large Flow Model through the Lens of Inversion Method

TL;DR

A simple and fast distribution transfer technique that facilitates image enhancement, stroke-based recoloring, as well as visual prompt-guided image editing is introduced, and quantitative results demonstrating the effectiveness of this method for tasks such as image enhancement, makeup transfer, reconstruction quality, and guided sampling quality are provided.

Abstract

We explore the oscillatory behavior observed in inversion methods applied to large-scale text-to-image diffusion models, with a focus on the "Flux" model. By employing a fixed-point-inspired iterative approach to invert real-world images, we observe that the solution does not achieve convergence, instead oscillating between distinct clusters. Through both toy experiments and real-world diffusion models, we demonstrate that these oscillating clusters exhibit notable semantic coherence. We offer theoretical insights, showing that this behavior arises from oscillatory dynamics in rectified flow models. Building on this understanding, we introduce a simple and fast distribution transfer technique that facilitates image enhancement, stroke-based recoloring, as well as visual prompt-guided image editing. Furthermore, we provide quantitative results demonstrating the effectiveness of our method for tasks such as image enhancement, makeup transfer, reconstruction quality, and guided sampling quality. Higher-quality examples of videos and images are available at \href{https://yanyanzheng96.github.io/oscillation_inversion/}{this link}.

Paper Structure

This paper contains 27 sections, 6 theorems, 69 equations, 15 figures, 3 tables.

Key Result

Lemma 1

Let $\pi_1$ be the source distribution and $\pi_0$ be the target distribution that the rectified flow transports, following the notations from Section theory. As in Eq. eq: formula of f, the function $f$ is given by $f(z) = y +\gamma v^X(z,\gamma)$ where $v^X(x,t) = \mathbb{E}[X_0 - X_1 \mid X_t = x where $\pi_t(x)$ is the probability density function of $X_t$.

Figures (15)

  • Figure 1: Oscillation Inversion is a phenomenon observed in large flow models. Building on this, we developed a simple and fast method that serves as a distribution transfer technique, enabling image enhancement as well as low-level editing, e.g. stroke-based relighting and recoloring.
  • Figure 2: In the left figure (a), fixed-point iteration causes oscillation, leading to subdomains with opposite features in the case of the brown-skinned girl, resulting in more tan and red tones. In the right figure (b), we demonstrate how this oscillation can be customized to achieve desired separations, such as changes in hair color.
  • Figure 3: By training flow matching on the distribution displayed in Fig. \ref{['fig:mixturegaussian']}, we demonstrate that the oscillation inversion phenomenon observed in large flow models aligns well with that seen in toy data. The period $m$ indicates that our group inversion starts with $m$ initial $y$s. Row (a) shows the trajectory of inverted latents, $z_{t_0}^{(k+1)}$. Row (b) shows the one-step prediction back to the input space from these latents. Row (c) presents the quantitative distance along the latents' trajectory, reflecting clear periodic patterns.
  • Figure 4: This is an example of group inversion, where the high-quality distribution is triggered by two degenerate distributions. We also apply this method to large-scale experiments in Section \ref{['sec:largescale']}.
  • Figure 5: Gradual evolution of two oscillating distributions occurs under the influence of visual prompts during fine-tuning. We utilize this method in the experiments discussed in Section \ref{['section:makeup']}.
  • ...and 10 more figures

Theorems & Definitions (9)

  • Lemma 1
  • Theorem 1: Informal
  • Theorem 2: Informal
  • Theorem 1
  • proof
  • Theorem 2
  • proof
  • Theorem 3
  • proof