Table of Contents
Fetching ...

Training-Free Refinement of Flow Matching with Divergence-based Sampling

Yeonwoo Cha, Jaehoon Yoo, Semin Kim, Yunseo Park, Jinhyeon Kwon, Seunghoon Hong

Abstract

Flow-based models learn a target distribution by modeling a marginal velocity field, defined as the average of sample-wise velocities connecting each sample from a simple prior to the target data. When sample-wise velocities conflict at the same intermediate state, however, this averaged velocity can misguide samples toward low-density regions, degrading generation quality. To address this issue, we propose the Flow Divergence Sampler (FDS), a training-free framework that refines intermediate states before each solver step. Our key finding reveals that the severity of this misguidance is quantified by the divergence of the marginal velocity field that is readily computable during inference with a well-optimized model. FDS exploits this signal to steer states toward less ambiguous regions. As a plug-and-play framework compatible with standard solvers and off-the-shelf flow backbones, FDS consistently improves fidelity across various generation tasks including text-to-image synthesis, and inverse problems.

Training-Free Refinement of Flow Matching with Divergence-based Sampling

Abstract

Flow-based models learn a target distribution by modeling a marginal velocity field, defined as the average of sample-wise velocities connecting each sample from a simple prior to the target data. When sample-wise velocities conflict at the same intermediate state, however, this averaged velocity can misguide samples toward low-density regions, degrading generation quality. To address this issue, we propose the Flow Divergence Sampler (FDS), a training-free framework that refines intermediate states before each solver step. Our key finding reveals that the severity of this misguidance is quantified by the divergence of the marginal velocity field that is readily computable during inference with a well-optimized model. FDS exploits this signal to steer states toward less ambiguous regions. As a plug-and-play framework compatible with standard solvers and off-the-shelf flow backbones, FDS consistently improves fidelity across various generation tasks including text-to-image synthesis, and inverse problems.

Paper Structure

This paper contains 51 sections, 5 theorems, 46 equations, 16 figures, 6 tables, 2 algorithms.

Key Result

theorem 1

For any $t$ such that $\alpha_t\neq 0$, the optimal CFM residual satisfies where $d$ is the dimensionality of the data. $\blacktriangleleft$$\blacktriangleleft$

Figures (16)

  • Figure 1: Overview of FDS. Our framework refines $x_{t_k}$ into $\tilde{x}_{t_k}$ at timetep $t_k$ to avoid high-discrepancy regions. In standard settings, severely conflicting sample-wise velocities can drive the marginal velocity toward low-density region, leading to degraded samples (red cross). To counteract this, our framework effectively steers the trajectory toward a reliable, low-discrepancy region (blue circle).
  • Figure 1: Performance comparison on CIFAR-10 and ImageNet $256\times256$. FDS consistently improves generation quality in terms of FID across all configurations in the main experiments. $\dagger$ denotes a base solver with an increased NFEs to match the wall-clock time of our framework.
  • Figure 2: 2D Synthetic Experiment shows our divergence-based criterion correlates with sample quality. (a) FDS achieves more accurate modeling of the target distribution than standard FM, yielding a lower Wasserstein Distance (WD). (b) Standard FM passes $x_{t_k}$ directly to the ODE solver, whereas FDS refines $x_{t_k}$ into $\tilde{x}_{t_k}$, moving it to a low-divergence region. (c) Discrepancy maps computed from the ground-truth sample-wise velocities (Top) and from our inference-time surrogate using the pre-trained model (Bottom) are highly consistent.
  • Figure 2: Quantitative evaluation on inverse problems. Baselines equipped with FDS demonstrate consistent improvements across all metrics. All reported values are lower-is-better.
  • Figure 3: Qualitative results on ImageNet$\mathbf{256\times 256}$ with JiT-L/16. Compared to the compute-matched baseline($\dagger$), FDS effectively enhances the generation quality.
  • ...and 11 more figures

Theorems & Definitions (5)

  • theorem 1
  • lemma 1: Conditional probability path score $\nabla_{x_t}\log p_t(x_t \mid x_1)$
  • lemma 2: Marginal score $\nabla_{x_t}\log p_t(x_t)$ identity
  • lemma 3: Posterior score $\nabla_{x_t}\log p_t(x_1\mid x_t)$ identity
  • theorem 1