Table of Contents
Fetching ...

Steering Rectified Flow Models in the Vector Field for Controlled Image Generation

Maitreya Patel, Song Wen, Dimitris N. Metaxas, Yezhou Yang

TL;DR

FlowChef unifies controlled image generation with Rectified Flow Models by steering the denoising trajectory in the vector field using gradient skipping, enabling efficient, inversion-free handling of linear inverse problems, image editing, and classifier guidance. The authors provide both theoretical analysis of error dynamics and practical algorithms, showing FlowChef can outperform baselines in speed, memory, and quality, while scaling to large latent and state-of-the-art models like Flux. Key contributions include a gradient-approximate update rule under local linearity and slow Jacobian variation, empirical validation across pixel- and latent-space tasks, and a broad demonstration of inversion-free editing and editing-style transfer. The work offers a practical, resource-efficient framework with broad applicability and potential for extension to video and 3D synthesis, alongside considerations for ethical use.

Abstract

Diffusion models (DMs) excel in photorealism, image editing, and solving inverse problems, aided by classifier-free guidance and image inversion techniques. However, rectified flow models (RFMs) remain underexplored for these tasks. Existing DM-based methods often require additional training, lack generalization to pretrained latent models, underperform, and demand significant computational resources due to extensive backpropagation through ODE solvers and inversion processes. In this work, we first develop a theoretical and empirical understanding of the vector field dynamics of RFMs in efficiently guiding the denoising trajectory. Our findings reveal that we can navigate the vector field in a deterministic and gradient-free manner. Utilizing this property, we propose FlowChef, which leverages the vector field to steer the denoising trajectory for controlled image generation tasks, facilitated by gradient skipping. FlowChef is a unified framework for controlled image generation that, for the first time, simultaneously addresses classifier guidance, linear inverse problems, and image editing without the need for extra training, inversion, or intensive backpropagation. Finally, we perform extensive evaluations and show that FlowChef significantly outperforms baselines in terms of performance, memory, and time requirements, achieving new state-of-the-art results. Project Page: \url{https://flowchef.github.io}.

Steering Rectified Flow Models in the Vector Field for Controlled Image Generation

TL;DR

FlowChef unifies controlled image generation with Rectified Flow Models by steering the denoising trajectory in the vector field using gradient skipping, enabling efficient, inversion-free handling of linear inverse problems, image editing, and classifier guidance. The authors provide both theoretical analysis of error dynamics and practical algorithms, showing FlowChef can outperform baselines in speed, memory, and quality, while scaling to large latent and state-of-the-art models like Flux. Key contributions include a gradient-approximate update rule under local linearity and slow Jacobian variation, empirical validation across pixel- and latent-space tasks, and a broad demonstration of inversion-free editing and editing-style transfer. The work offers a practical, resource-efficient framework with broad applicability and potential for extension to video and 3D synthesis, alongside considerations for ethical use.

Abstract

Diffusion models (DMs) excel in photorealism, image editing, and solving inverse problems, aided by classifier-free guidance and image inversion techniques. However, rectified flow models (RFMs) remain underexplored for these tasks. Existing DM-based methods often require additional training, lack generalization to pretrained latent models, underperform, and demand significant computational resources due to extensive backpropagation through ODE solvers and inversion processes. In this work, we first develop a theoretical and empirical understanding of the vector field dynamics of RFMs in efficiently guiding the denoising trajectory. Our findings reveal that we can navigate the vector field in a deterministic and gradient-free manner. Utilizing this property, we propose FlowChef, which leverages the vector field to steer the denoising trajectory for controlled image generation tasks, facilitated by gradient skipping. FlowChef is a unified framework for controlled image generation that, for the first time, simultaneously addresses classifier guidance, linear inverse problems, and image editing without the need for extra training, inversion, or intensive backpropagation. Finally, we perform extensive evaluations and show that FlowChef significantly outperforms baselines in terms of performance, memory, and time requirements, achieving new state-of-the-art results. Project Page: \url{https://flowchef.github.io}.

Paper Structure

This paper contains 51 sections, 4 theorems, 39 equations, 20 figures, 9 tables, 3 algorithms.

Key Result

Proposition 4.1

Let $p_1 \sim \mathcal{N}(0, \mathcal{I})$ be the noise distribution and $p_0$ be the data distribution. Let $x_t$ denote an intermediate sample obtained from a predefined forward function $q$ as $x_t = q(x_0, x_1, t)$, where $x_0 \sim p_0$ and $x_1 \sim p_1$. Define an ODE sampling process $dx(t) = where $e(t) = \hat{x}_0 - x_0^{ref}$, $E(t) = e(t)^Te(t)$ is the squared error magnitude, $s > 0$ i

Figures (20)

  • Figure 1: FlowChef steers the trajectory of Rectified Flow Models during inference to tackle linear inverse problems, image editing, and classifier guidance. We extend FlowChef to SOTA models like Flux and InstaFlow, enabling gradient- and inversion-free control for efficient, controlled image generation.
  • Figure 2: Motivation behind FlowChef based on rectified flow models' trajectory space. Let $p_1 \sim N(0,I)$ and $p_0$ be distributions, with $x_1 \sim p_1$ as initial noise, $x_0^{ref}$ as the target sample, $\hat{x}_0$ as the denoised sample from $x_1$, and $x_1^{ref}$ as the specific noise leading to $x_0^{ref}$. (a) Stochasticity and nonlinear trajectories with crossovers can complicate gradient estimation at each denoising step $t$. (b) D-Flow (baseline) inference-time trajectory requires the backpropagation through entire denoising steps. (c) Our method FlowChef enables efficient trajectory steering to guide $x_t$ along the trajectory towards $x_0^{ref}$.
  • Figure 3: Illustration of impact of guided control step on Flux.1[Dev] with mean squared error as cost function ($\mathcal{L} = || \hat{x}_0 - x_0^{ref} ||^2_2$). This shows that FlowChef could guide the rectified flow models on the fly without requiring either the gradients through the Flux model or inversion. Importantly, the convergence speed is slowed down for illustration purposes.
  • Figure 4: Qualitative results on linear inverse problems. All baselines are implemented on stable diffusion v1.5, except FlowChef Flux variant. Results are reported for VRAM and time on an A100 GPU at 512 x 512 resolution, with Flux experiments at 1024 x 1024. Best viewed when zoomed in.
  • Figure 5: Qualitative results on image editing. As illustrated, our method attains the SOTA performance on comparison inversion-free methods. While FlowChef (Flux) variant achieves better quality and edits.
  • ...and 15 more figures

Theorems & Definitions (8)

  • Proposition 4.1
  • Lemma 4.2: Gradient Relationship
  • Theorem 4.3
  • proof
  • proof
  • proof
  • Proposition 10.1
  • proof