Table of Contents
Fetching ...

Reflective Flow Sampling Enhancement

Zikai Zhou, Muyao Wang, Shitong Shao, Lichen Bai, Haoyi Xiong, Bo Han, Zeke Xie

TL;DR

Reflective Flow Sampling (RF-Sampling) is proposed, a theoretically-grounded and training-free inference enhancement framework explicitly designed for flow models, especially for the CFG-distilled variants (i.e., models distilled from CFG guidance techniques).

Abstract

The growing demand for text-to-image generation has led to rapid advances in generative modeling. Recently, text-to-image diffusion models trained with flow matching algorithms, such as FLUX, have achieved remarkable progress and emerged as strong alternatives to conventional diffusion models. At the same time, inference-time enhancement strategies have been shown to improve the generation quality and text-prompt alignment of text-to-image diffusion models. However, these techniques are mainly applicable to conventional diffusion models and usually fail to perform well on flow models. To bridge this gap, we propose Reflective Flow Sampling (RF-Sampling), a theoretically-grounded and training-free inference enhancement framework explicitly designed for flow models, especially for the CFG-distilled variants (i.e., models distilled from CFG guidance techniques), like FLUX. Departing from heuristic interpretations, we provide a formal derivation proving that RF-Sampling implicitly performs gradient ascent on the text-image alignment score. By leveraging a linear combination of textual representations and integrating them with flow inversion, RF-Sampling allows the model to explore noise spaces that are more consistent with the input prompt. Extensive experiments across multiple benchmarks demonstrate that RF-Sampling consistently improves both generation quality and prompt alignment. Moreover, RF-Sampling is also the first inference enhancement method that can exhibit test-time scaling ability to some extent on FLUX.

Reflective Flow Sampling Enhancement

TL;DR

Reflective Flow Sampling (RF-Sampling) is proposed, a theoretically-grounded and training-free inference enhancement framework explicitly designed for flow models, especially for the CFG-distilled variants (i.e., models distilled from CFG guidance techniques).

Abstract

The growing demand for text-to-image generation has led to rapid advances in generative modeling. Recently, text-to-image diffusion models trained with flow matching algorithms, such as FLUX, have achieved remarkable progress and emerged as strong alternatives to conventional diffusion models. At the same time, inference-time enhancement strategies have been shown to improve the generation quality and text-prompt alignment of text-to-image diffusion models. However, these techniques are mainly applicable to conventional diffusion models and usually fail to perform well on flow models. To bridge this gap, we propose Reflective Flow Sampling (RF-Sampling), a theoretically-grounded and training-free inference enhancement framework explicitly designed for flow models, especially for the CFG-distilled variants (i.e., models distilled from CFG guidance techniques), like FLUX. Departing from heuristic interpretations, we provide a formal derivation proving that RF-Sampling implicitly performs gradient ascent on the text-image alignment score. By leveraging a linear combination of textual representations and integrating them with flow inversion, RF-Sampling allows the model to explore noise spaces that are more consistent with the input prompt. Extensive experiments across multiple benchmarks demonstrate that RF-Sampling consistently improves both generation quality and prompt alignment. Moreover, RF-Sampling is also the first inference enhancement method that can exhibit test-time scaling ability to some extent on FLUX.
Paper Structure (66 sections, 4 theorems, 37 equations, 42 figures, 15 tables, 1 algorithm)

This paper contains 66 sections, 4 theorems, 37 equations, 42 figures, 15 tables, 1 algorithm.

Key Result

Theorem 1

Let the alignment score $J(x_t)$ be differentiable with gradient $\nabla_x J(x_t) \neq \mathbf{0}$. Assume the vector field $v_\theta(x, t, c)$ is locally Lipschitz continuous with respect to $x$ and differentiable with respect to $c$. Under the first-order Taylor expansion around the null prompt em where $\mathcal{A} = s_{high}\beta_{high} - s_{low}\beta_{low} > 0$ is the alignment coefficient, a

Figures (42)

  • Figure 1: Qualitative comparisons with three representative flow models. Images for each prompt are synthesized using the same random seed. More visualization results are in Appendix \ref{['sec:more_vis']}.
  • Figure 2: RF-Sampling outperforms standard sampling with the same time consumption and significantly enhances the performance of FLUX-Lite and FLUX-Dev. With the increase of inference time, RF-Sampling consistently performs well, validating the scalability of our method. (Breakdown is shown in Appendix Tab. \ref{['tab:fig2_breakdown']})
  • Figure 3: Illustration of RF-Sampling. Compared to previous methods, RF-Sampling employs interpolation on text embeddings similar to the traditional CFG, thereby enhancing the model's generation quality and making it more suitable for flow diffusion models, especially CFG-distilled models.
  • Figure 4: The winning rate of RF-Sampling over other methods on SD3.5. The standard sampling (baseline) winning rate defaults to 50%. The results reveal the superiority of RF-Sampling in synthesizing images with good quality.
  • Figure 5: The winning rate of RF-Sampling over other methods on FLUX. The standard sampling (baseline) winning rate defaults to 50%. The results reveal the superiority of RF-Sampling in synthesizing images with good quality.
  • ...and 37 more figures

Theorems & Definitions (8)

  • Theorem 1: First-Order Validity
  • proof : Proof Sketch
  • Theorem 2: Second-Order Optimality
  • proof : Proof Sketch
  • Proposition 3: First-Order Validity
  • proof
  • Proposition 4: Second-Order Optimality
  • proof