Table of Contents
Fetching ...

Rectified-CFG++ for Flow Based Models

Shreshth Saini, Shashank Gupta, Alan C. Bovik

TL;DR

Rectified-CFG++ addresses off-manifold drift in CFG-guided flow-based models by introducing a geometry-aware predictor–corrector sampler that first follows the conditional velocity and then applies a time-scheduled interpolation toward the conditional and unconditional fields. The method preserves marginal consistency and keeps trajectories near the data manifold, providing theoretical guarantees and practical stability. Empirically, it yields consistent improvements in FID, CLIP, and human-preference metrics across Flux, Stable Diffusion 3/3.5, and Lumina on MS-COCO, LAION-Aesthetic, and T2I-CompBench, while reducing artifacts and improving text alignment. As a training-free, drop-in upgrade with negligible compute, Rectified-CFG++ offers a scalable path to higher-quality, more reliable text-to-image generation in large RF backbones.

Abstract

Classifier-free guidance (CFG) is the workhorse for steering large diffusion models toward text-conditioned targets, yet its native application to rectified flow (RF) based models provokes severe off-manifold drift, yielding visual artifacts, text misalignment, and brittle behaviour. We present Rectified-CFG++, an adaptive predictor-corrector guidance that couples the deterministic efficiency of rectified flows with a geometry-aware conditioning rule. Each inference step first executes a conditional RF update that anchors the sample near the learned transport path, then applies a weighted conditional correction that interpolates between conditional and unconditional velocity fields. We prove that the resulting velocity field is marginally consistent and that its trajectories remain within a bounded tubular neighbourhood of the data manifold, ensuring stability across a wide range of guidance strengths. Extensive experiments on large-scale text-to-image models (Flux, Stable Diffusion 3/3.5, Lumina) show that Rectified-CFG++ consistently outperforms standard CFG on benchmark datasets such as MS-COCO, LAION-Aesthetic, and T2I-CompBench. Project page: https://rectified-cfgpp.github.io/

Rectified-CFG++ for Flow Based Models

TL;DR

Rectified-CFG++ addresses off-manifold drift in CFG-guided flow-based models by introducing a geometry-aware predictor–corrector sampler that first follows the conditional velocity and then applies a time-scheduled interpolation toward the conditional and unconditional fields. The method preserves marginal consistency and keeps trajectories near the data manifold, providing theoretical guarantees and practical stability. Empirically, it yields consistent improvements in FID, CLIP, and human-preference metrics across Flux, Stable Diffusion 3/3.5, and Lumina on MS-COCO, LAION-Aesthetic, and T2I-CompBench, while reducing artifacts and improving text alignment. As a training-free, drop-in upgrade with negligible compute, Rectified-CFG++ offers a scalable path to higher-quality, more reliable text-to-image generation in large RF backbones.

Abstract

Classifier-free guidance (CFG) is the workhorse for steering large diffusion models toward text-conditioned targets, yet its native application to rectified flow (RF) based models provokes severe off-manifold drift, yielding visual artifacts, text misalignment, and brittle behaviour. We present Rectified-CFG++, an adaptive predictor-corrector guidance that couples the deterministic efficiency of rectified flows with a geometry-aware conditioning rule. Each inference step first executes a conditional RF update that anchors the sample near the learned transport path, then applies a weighted conditional correction that interpolates between conditional and unconditional velocity fields. We prove that the resulting velocity field is marginally consistent and that its trajectories remain within a bounded tubular neighbourhood of the data manifold, ensuring stability across a wide range of guidance strengths. Extensive experiments on large-scale text-to-image models (Flux, Stable Diffusion 3/3.5, Lumina) show that Rectified-CFG++ consistently outperforms standard CFG on benchmark datasets such as MS-COCO, LAION-Aesthetic, and T2I-CompBench. Project page: https://rectified-cfgpp.github.io/

Paper Structure

This paper contains 50 sections, 6 theorems, 36 equations, 22 figures, 15 tables, 2 algorithms.

Key Result

Lemma 3.1

Under assumptions (A1) and (A4), the guidance direction $\Delta v^{\theta}_{t-\Delta t/2}$ computed at the predicted state $\tilde{x}_{t-\Delta t/2}$ differs from the guidance direction $\Delta v^{\theta}_t(x_t)$ at the current state by an amount proportional to the step size $\Delta t$:

Figures (22)

  • Figure 1: Effect of guidance on flow-based models. (Left) Unguided samples lack structure; (Middle) naive CFG introduces semantic drift and artifacts. (Right) Rectified CFG++ yields detailed and well-aligned outputs.
  • Figure 2: Comparison of intermediate denoising steps of CFG and Rectified-CFG++. Visual progression of decoded latents across 7 sampling steps, starting from $t{=}1000$ (top left) to $t{=}0$ (top right). While CFG led to artifacts and structural instability early on, Rectified CFG++ maintained on-manifold transitions and preserved fine textures throughout.
  • Figure 3: T2I results from Flux flux2024 across pick-a-pic kirstain2023pick prompts.
  • Figure 4: Guidance strategy comparison on SD3.5 sd32024.
  • Figure 5: Comparison of CFG vs Rectified-CFG++ combined with SD3/3.5 sd32024 and Lumina lumina-2 with diverse prompts. Rectified-CFG++ consistently better enhance semantic alignment, compositional balance, and generative fidelity across models and scenes.
  • ...and 17 more figures

Theorems & Definitions (11)

  • Lemma 3.1: Stability of Predicted Guidance Direction
  • proof
  • Proposition 1: Bounded Single-Step Perturbation
  • proof
  • Lemma A.1: Manifold-Faithful Corrector
  • Lemma A.2: Stability of Predicted Guidance Direction
  • proof
  • Proposition 2: Bounded Single-Step Perturbation
  • proof
  • Proposition 3: Bounded Distributional Deviation
  • ...and 1 more