Table of Contents
Fetching ...

GLASS Flows: Transition Sampling for Alignment of Flow and Diffusion Models

Peter Holderrieth, Uriel Singer, Tommi Jaakkola, Ricky T. Q. Chen, Yaron Lipman, Brian Karrer

TL;DR

GLASS Flows address the inefficiency of reward-alignment methods that rely on stochastic sampling by enabling efficient ODE-based sampling of Markov transitions $p_{t'|t}$ via an inner, retrievable flow-matching model derived through sufficient statistics. This unifies the efficiency of ODE sampling with the stochasticity of SDEs, improving text-to-image generation performance when combined with reward-alignment techniques like Feynman-Kac Steering and reward guidance. The approach yields a theoretical framework for constructing transitions, practical algorithms for posterior sampling and SMC, and strong empirical gains on large-scale models, effectively removing the efficiency-stochasticity tradeoff in inference-time scaling. Overall, GLASS Flows serve as a plug-in, training-free enhancement for reward-aligned sampling across flow and diffusion models with clear practical impact for scalable, high-quality generation.

Abstract

The performance of flow matching and diffusion models can be greatly improved at inference time using reward alignment algorithms, yet efficiency remains a major limitation. While several algorithms were proposed, we demonstrate that a common bottleneck is the sampling method these algorithms rely on: many algorithms require to sample Markov transitions via SDE sampling, which is significantly less efficient and often less performant than ODE sampling. To remove this bottleneck, we introduce GLASS Flows, a new sampling paradigm that simulates a "flow matching model within a flow matching model" to sample Markov transitions. As we show in this work, this "inner" flow matching model can be retrieved from a pre-trained model without any re-training, combining the efficiency of ODEs with the stochastic evolution of SDEs. On large-scale text-to-image models, we show that GLASS Flows eliminate the trade-off between stochastic evolution and efficiency. Combined with Feynman-Kac Steering, GLASS Flows improve state-of-the-art performance in text-to-image generation, making it a simple, drop-in solution for inference-time scaling of flow and diffusion models.

GLASS Flows: Transition Sampling for Alignment of Flow and Diffusion Models

TL;DR

GLASS Flows address the inefficiency of reward-alignment methods that rely on stochastic sampling by enabling efficient ODE-based sampling of Markov transitions via an inner, retrievable flow-matching model derived through sufficient statistics. This unifies the efficiency of ODE sampling with the stochasticity of SDEs, improving text-to-image generation performance when combined with reward-alignment techniques like Feynman-Kac Steering and reward guidance. The approach yields a theoretical framework for constructing transitions, practical algorithms for posterior sampling and SMC, and strong empirical gains on large-scale models, effectively removing the efficiency-stochasticity tradeoff in inference-time scaling. Overall, GLASS Flows serve as a plug-in, training-free enhancement for reward-aligned sampling across flow and diffusion models with clear practical impact for scalable, high-quality generation.

Abstract

The performance of flow matching and diffusion models can be greatly improved at inference time using reward alignment algorithms, yet efficiency remains a major limitation. While several algorithms were proposed, we demonstrate that a common bottleneck is the sampling method these algorithms rely on: many algorithms require to sample Markov transitions via SDE sampling, which is significantly less efficient and often less performant than ODE sampling. To remove this bottleneck, we introduce GLASS Flows, a new sampling paradigm that simulates a "flow matching model within a flow matching model" to sample Markov transitions. As we show in this work, this "inner" flow matching model can be retrieved from a pre-trained model without any re-training, combining the efficiency of ODEs with the stochastic evolution of SDEs. On large-scale text-to-image models, we show that GLASS Flows eliminate the trade-off between stochastic evolution and efficiency. Combined with Feynman-Kac Steering, GLASS Flows improve state-of-the-art performance in text-to-image generation, making it a simple, drop-in solution for inference-time scaling of flow and diffusion models.

Paper Structure

This paper contains 44 sections, 7 theorems, 101 equations, 14 figures, 5 tables, 1 algorithm.

Key Result

Proposition 0

For $\rho=\frac{\alpha_t\sigma_{t'}}{\sigma_{t}\alpha_{t'}}$, we get that: $p_{t'|t}^{\text{DDPM}}(X_{t'}|X_{t})=p_{t'|t}(X_{t'}|X_{t})$, i.e. DDPM transitions are a special case of GLASS transitions.

Figures (14)

  • Figure 1: GLASS Flows overview. Left: Sampling transition $p_{t'|t}(x_{t'}|x_t)$ with GLASS Flows. Initial Gaussian samples $\bar{x}_{s=0}$ are evolved from inner time $s=0$ to $s=1$ via the velocity field $u_s(\bar{x}_s|x_t,t)$ that is obtained by transforming a pre-trained flow matching model. Right: Reward alignment with GLASS Flows improves text-image alignment.
  • Figure 2: Posterior sampling experiments. We noise images and then sample from the posterior $z\sim p_{1|t}(\cdot|x)$ via DDPM or GLASS Flows. Left: Examples for $t=0.2$ and $M=6$ simulation steps. Middle: FID values for various simulation steps $M$ and time $t$. Right: Estimation of the value function as assessed by correlation with ground truth (200 Monte Carlo samples with $M=200$).
  • Figure 3: Sampling from SiT/FLUX with various sampling methods. Left: Comparison with FLUX of images generated with DDPM vs. GLASS Flows. DDPM samples are more blurry and of lower quality. Middle: Results for SiT. Right: Results for FLUX. Prompts: "Carrots" and "Refrigerator".
  • Figure 4: Detailed results for \ref{['fig:posterior_flows_summary_figure']} (Middle). Comparing the performance of sampling the posterior $p_{1|t}$ via GLASS Flows (Ours) and SDE (DDPM) sampling. Ablate over different times $t$ and sampling steps. GLASS Flows achieve significantly lower FID for lower number of sampling steps than DDPM sampling.
  • Figure 5: Detailed results for \ref{['fig:posterior_flows_summary_figure']} (Right). Comparing the performance of estimating the value function $V_t(x)$ via sampling the posterior $p_{1|t}$ via GLASS Flows (Ours) and SDE (DDPM) sampling via correlation. Experiment performed for different times $t$ and sampling steps $M$. GLASS Flows achieve significantly higher correlation for lower number of steps than DDPM sampling. Ground truth is measured via 200 samples with 200 simulation steps of ODE/SDE.
  • ...and 9 more figures

Theorems & Definitions (11)

  • Proposition 0
  • Proposition 0
  • Theorem 0
  • Proposition 0
  • proof
  • Lemma 1: Equivalent observations for multivariate Gaussian
  • proof
  • Proposition 0
  • proof
  • Theorem 0
  • ...and 1 more