Table of Contents
Fetching ...

On the Guidance of Flow Matching

Ruiqi Feng, Chenglei Yu, Wenhao Deng, Peiyan Hu, Tailin Wu

TL;DR

This work addresses guiding flow-matching generative models beyond the Gaussian, diffusion-like setup by introducing a unified guidance framework that supports arbitrary source distributions and couplings. It derives a spectrum of guidance strategies, including a training-free Monte Carlo estimator $(g_t^{\text{MC}})$, a local Taylor-based guide $(g_t^{\text{local}})$, a Gaussian-posterior approximation-based guide $(g^{\text{sim}})$, and a trainable guide $(g_\phi)$ with Guidance Matching losses. The framework subsumes classical diffusion guidance as a special case under the uncoupled affine Gaussian-path assumption, and experiments on synthetic data, offline planning, and image inverse problems show that MC and learned guidance are robust across non-Gaussian and dependent-coupling settings, often outperforming gradient-based approximations. These results broaden the applicability of flow matching to a wider range of tasks by providing principled, versatile, and scalable guidance mechanisms, accompanied by public code for reproducibility.

Abstract

Flow matching has shown state-of-the-art performance in various generative tasks, ranging from image generation to decision-making, where generation under energy guidance (abbreviated as guidance in the following) is pivotal. However, the guidance of flow matching is more general than and thus substantially different from that of its predecessor, diffusion models. Therefore, the challenge in guidance for general flow matching remains largely underexplored. In this paper, we propose the first framework of general guidance for flow matching. From this framework, we derive a family of guidance techniques that can be applied to general flow matching. These include a new training-free asymptotically exact guidance, novel training losses for training-based guidance, and two classes of approximate guidance that cover classical gradient guidance methods as special cases. We theoretically investigate these different methods to give a practical guideline for choosing suitable methods in different scenarios. Experiments on synthetic datasets, image inverse problems, and offline reinforcement learning demonstrate the effectiveness of our proposed guidance methods and verify the correctness of our flow matching guidance framework. Code to reproduce the experiments can be found at https://github.com/AI4Science-WestlakeU/flow_guidance.

On the Guidance of Flow Matching

TL;DR

This work addresses guiding flow-matching generative models beyond the Gaussian, diffusion-like setup by introducing a unified guidance framework that supports arbitrary source distributions and couplings. It derives a spectrum of guidance strategies, including a training-free Monte Carlo estimator , a local Taylor-based guide , a Gaussian-posterior approximation-based guide , and a trainable guide with Guidance Matching losses. The framework subsumes classical diffusion guidance as a special case under the uncoupled affine Gaussian-path assumption, and experiments on synthetic data, offline planning, and image inverse problems show that MC and learned guidance are robust across non-Gaussian and dependent-coupling settings, often outperforming gradient-based approximations. These results broaden the applicability of flow matching to a wider range of tasks by providing principled, versatile, and scalable guidance mechanisms, accompanied by public code for reproducibility.

Abstract

Flow matching has shown state-of-the-art performance in various generative tasks, ranging from image generation to decision-making, where generation under energy guidance (abbreviated as guidance in the following) is pivotal. However, the guidance of flow matching is more general than and thus substantially different from that of its predecessor, diffusion models. Therefore, the challenge in guidance for general flow matching remains largely underexplored. In this paper, we propose the first framework of general guidance for flow matching. From this framework, we derive a family of guidance techniques that can be applied to general flow matching. These include a new training-free asymptotically exact guidance, novel training losses for training-based guidance, and two classes of approximate guidance that cover classical gradient guidance methods as special cases. We theoretically investigate these different methods to give a practical guideline for choosing suitable methods in different scenarios. Experiments on synthetic datasets, image inverse problems, and offline reinforcement learning demonstrate the effectiveness of our proposed guidance methods and verify the correctness of our flow matching guidance framework. Code to reproduce the experiments can be found at https://github.com/AI4Science-WestlakeU/flow_guidance.

Paper Structure

This paper contains 48 sections, 6 theorems, 127 equations, 7 figures, 6 tables, 2 algorithms.

Key Result

Theorem 3.1

Adding the guidance VF $g_t(x_t)$ to the original VF $v_t(x_t)$ will form VF $v'_t(x_t)$ that generates $p_t'(x_t) = \int p_t(x_t|z)p'(z)dz$, as long as $g_t(x_t)$ follows: $\mathcal{P} = \frac{\pi'(x_0|x_1)}{\pi(x_0|x_1)}$ is the reverse coupling ratio, where $\pi'(x_0|x_1)$ is the reverse data coupling for the new VF, i.e., the distribution of $x_0$ given $x_1$ sampled from the target distribut

Figures (7)

  • Figure 1: Overview of guidance methods in the paper. We start with a unified guidance expression and derive different guidance methods, including training-free and training-based methods, and cover many classical diffusion guidances.
  • Figure 2: Results of the synthetic dataset with different source (blue) and target (red) distributions. We visualize the start/end points and the flow trajectories. $g^{\text{MC}}$ and $g_\phi$ yield the best guidance across different settings while diffusion guidance fails.
  • Figure 3: $R$ distribution of generated trajectories in Locomotion. $g^{\text{MC}}$ matches the target gray dashed line well.
  • Figure 4: Error scaling with Monte Carlo sample number. In the synthetic dataset, the guidance performance ($\mathcal{W}_2$ distance between the generated distribution and the ground truth energy weighted distribution $p(x_1)e^{-J(x_1)} / Z$) decreases as the number of Monte Carlo samples increases. The dashed lines denote the $\mathcal{W}_2$ distance between the learned unguided distribution and the original ground truth distribution $p(x_1)$. The reason why the guided generation errors (crosses) do not converge to the dashed lines is that they measure the $\mathcal{W}_2$ distance of $p(x_1)$ and $p(x_1)e^{-J(x_1)} / Z$, respectively.
  • Figure 6: The visualization of the image inverse problems with the base flow matching model of mini-batch optimal transport conditional flow matching (OT-CFM). Three rows show the results of Gaussian deblurring, box-inpainting, and super-resolution from top to bottom.
  • ...and 2 more figures

Theorems & Definitions (10)

  • Theorem 3.1
  • Proposition 3.4
  • Proposition 3.5
  • Theorem 1.1
  • Remark 1.2
  • Remark 1.3
  • Proposition 1.4
  • Proposition 1.5
  • Remark 1.6
  • Remark 1.7