Table of Contents
Fetching ...

Bellman Optimal Stepsize Straightening of Flow-Matching Models

Bao Nguyen, Binh Nguyen, Viet Anh Nguyen

TL;DR

This work tackles the computational burden of flow-matching generative models by introducing Bellman Optimal Stepsize Straightening (BOSS). It first computes a Bellman-optimal nonuniform sampling schedule $\{\tau_k\}$ for a pretrained velocity network, then retrains the network to straighten the generation path along that schedule, enabling high-quality samples with a small number of function evaluations. Across multiple image datasets, BOSS yields substantial efficiency gains and competitive FID scores, even with limited retraining, and demonstrates transferability of stepsizes and effective use of Low-Rank Adaptation (LoRA) to further reduce trainable parameters. The approach advances practical, resource-efficient flow-based generation, reducing environmental footprint while maintaining sample fidelity.

Abstract

Flow matching is a powerful framework for generating high-quality samples in various applications, especially image synthesis. However, the intensive computational demands of these models, especially during the finetuning process and sampling processes, pose significant challenges for low-resource scenarios. This paper introduces Bellman Optimal Stepsize Straightening (BOSS) technique for distilling flow-matching generative models: it aims specifically for a few-step efficient image sampling while adhering to a computational budget constraint. First, this technique involves a dynamic programming algorithm that optimizes the stepsizes of the pretrained network. Then, it refines the velocity network to match the optimal step sizes, aiming to straighten the generation paths. Extensive experimental evaluations across image generation tasks demonstrate the efficacy of BOSS in terms of both resource utilization and image quality. Our results reveal that BOSS achieves substantial gains in efficiency while maintaining competitive sample quality, effectively bridging the gap between low-resource constraints and the demanding requirements of flow-matching generative models. Our paper also fortifies the responsible development of artificial intelligence, offering a more sustainable generative model that reduces computational costs and environmental footprints. Our code can be found at https://github.com/nguyenngocbaocmt02/BOSS.

Bellman Optimal Stepsize Straightening of Flow-Matching Models

TL;DR

This work tackles the computational burden of flow-matching generative models by introducing Bellman Optimal Stepsize Straightening (BOSS). It first computes a Bellman-optimal nonuniform sampling schedule for a pretrained velocity network, then retrains the network to straighten the generation path along that schedule, enabling high-quality samples with a small number of function evaluations. Across multiple image datasets, BOSS yields substantial efficiency gains and competitive FID scores, even with limited retraining, and demonstrates transferability of stepsizes and effective use of Low-Rank Adaptation (LoRA) to further reduce trainable parameters. The approach advances practical, resource-efficient flow-based generation, reducing environmental footprint while maintaining sample fidelity.

Abstract

Flow matching is a powerful framework for generating high-quality samples in various applications, especially image synthesis. However, the intensive computational demands of these models, especially during the finetuning process and sampling processes, pose significant challenges for low-resource scenarios. This paper introduces Bellman Optimal Stepsize Straightening (BOSS) technique for distilling flow-matching generative models: it aims specifically for a few-step efficient image sampling while adhering to a computational budget constraint. First, this technique involves a dynamic programming algorithm that optimizes the stepsizes of the pretrained network. Then, it refines the velocity network to match the optimal step sizes, aiming to straighten the generation paths. Extensive experimental evaluations across image generation tasks demonstrate the efficacy of BOSS in terms of both resource utilization and image quality. Our results reveal that BOSS achieves substantial gains in efficiency while maintaining competitive sample quality, effectively bridging the gap between low-resource constraints and the demanding requirements of flow-matching generative models. Our paper also fortifies the responsible development of artificial intelligence, offering a more sustainable generative model that reduces computational costs and environmental footprints. Our code can be found at https://github.com/nguyenngocbaocmt02/BOSS.
Paper Structure (21 sections, 17 equations, 16 figures, 8 tables, 1 algorithm)

This paper contains 21 sections, 17 equations, 16 figures, 8 tables, 1 algorithm.

Figures (16)

  • Figure 1: An example with $K^{\max} = 5$ to illustrate the computation of the sampling error.
  • Figure 2: A network flow formulation to find the optimal sampling schedule for image generation. Time $t_0=0$ represents noise, while $t_{K^{\max}} = 1$ is the terminal data (images). Each discretized timestamp is represented by a node, with edges reflecting the one-dimensional flow of time from noise to image. The cost $c_{jk}$ associated with each edge is the sampling error estimate, measured by the average difference between the Euler one-step and the Euler $(k-j)$-step sampling between $t_j$ and $t_k$, see Section \ref{['sec:cost']}.
  • Figure 3: Continued example following Figure \ref{['fig:cost']} for straightening with $K = 2$ NFEs, evaluated at time $t_0 = 0$ and time $t_2 = 0.4$. Blue arrows are velocity vectors given by the pretrained model, and purple arrows following the dashed lines are the ideal straight path. The straightening procedure in Section \ref{['sec:straighten']} aims to align the blue arrows towards the purple arrows. Arrows illustrate directions and are not drawn with proper scale.
  • Figure 4: The FID score of sampling methods with different numbers of function evaluations (step sizes). Images generated by samplers using Bellman stepsizes clearly show lower FID than conventional ones that use uniform step sizes. Note that Uniform Heun and Bellman Heun are second-order sampling methods that use twice the NFEs.
  • Figure 5: Qualitative results on unconditional image generation task. From first to last row: CelebA-HQ/LSUN-Bedroom/LSUN-Church/AFHQ-Cat dataset. (a)-(b): Comparisons of Euler stepsizes between uniform (a) and the Bellman optimal stepsizes (b); (c)-(d): Comparisons of BOSS retraining and Runge-Kutta-45 sampling. Notice our proposed BOSS sampling has comparably similar visual quality to RK45 while requiring only 6 NFEs, compared to 208 NFEs of RK45.
  • ...and 11 more figures