FastFlow: Accelerating The Generative Flow Matching Models with Bandit Inference

Divya Jyoti Bajpai; Dhruv Bhardwaj; Soumya Roy; Tejas Duseja; Harsh Agarwal; Aashay Sandansing; Manjesh Kumar Hanawal

FastFlow: Accelerating The Generative Flow Matching Models with Bandit Inference

Divya Jyoti Bajpai, Dhruv Bhardwaj, Soumya Roy, Tejas Duseja, Harsh Agarwal, Aashay Sandansing, Manjesh Kumar Hanawal

TL;DR

Flow-matching models achieve high fidelity but suffer from slow, sequential inference due to denoising along a trajectory. FastFlow is a training-free adaptive inference framework that uses a per-timestep multi-armed bandit to decide when to skip steps, with velocity extrapolation via a first-order Taylor update and finite-difference from past predictions; the final-state error is bounded by $e_T = O(|S|/T^3)$. The paper contributes a theoretical error bound, a practical MAB-based adaptive mechanism, and empirical 2.6x+ speedups across image, video, and editing tasks while maintaining quality. It is plug-and-play and generalizes across FM-based models and tasks, enabling real-time generation on constrained hardware.

Abstract

Flow-matching models deliver state-of-the-art fidelity in image and video generation, but the inherent sequential denoising process renders them slower. Existing acceleration methods like distillation, trajectory truncation, and consistency approaches are static, require retraining, and often fail to generalize across tasks. We propose FastFlow, a plug-and-play adaptive inference framework that accelerates generation in flow matching models. FastFlow identifies denoising steps that produce only minor adjustments to the denoising path and approximates them without using the full neural network models used for velocity predictions. The approximation utilizes finite-difference velocity estimates from prior predictions to efficiently extrapolate future states, enabling faster advancements along the denoising path at zero compute cost. This enables skipping computation at intermediary steps. We model the decision of how many steps to safely skip before requiring a full model computation as a multi-armed bandit problem. The bandit learns the optimal skips to balance speed with performance. FastFlow integrates seamlessly with existing pipelines and generalizes across image generation, video generation, and editing tasks. Experiments demonstrate a speedup of over 2.6x while maintaining high-quality outputs. The source code for this work can be found at https://github.com/Div290/FastFlow.

FastFlow: Accelerating The Generative Flow Matching Models with Bandit Inference

TL;DR

. The paper contributes a theoretical error bound, a practical MAB-based adaptive mechanism, and empirical 2.6x+ speedups across image, video, and editing tasks while maintaining quality. It is plug-and-play and generalizes across FM-based models and tasks, enabling real-time generation on constrained hardware.

Abstract

Paper Structure (19 sections, 2 theorems, 22 equations, 11 figures, 3 tables, 1 algorithm)

This paper contains 19 sections, 2 theorems, 22 equations, 11 figures, 3 tables, 1 algorithm.

Introduction
Related works
Methodology
Flow Matching Overview
Our method
Experiments
Results
Analysis
Conclusion
Ethics and Reproducibility Statement
Appendix
FastFlow vs. TeaCache vs. Direct Reduction
Speedup vs performance curve
FastFlow’s skip patterns:
Adaptiveness of FastFlow
...and 4 more sections

Key Result

Theorem 3.1

Let $\{ x_{t_k}^{\mathrm{true}} \}$ denote the trajectory obtained using the exact velocity field with the forward Euler method, and let $\{ x_{t_k}^{\mathrm{approx}} \}$ be the trajectory where velocity evaluations are skipped at a subset of steps $\mathcal{S} \subseteq \{0, \dots, T-1\}$ and are i

Figures (11)

Figure 1: Overview of our method. At each step, the multi-armed bandit (MAB) selects the number of steps to approximate the trajectory. The bandit receives a reward proportional to the number of steps successfully approximated, while deviations from the computed velocity incur a penalty. This adaptive strategy allows the model to balance efficiency and accuracy across the trajectory.
Figure 2: Comparison of edit quality across two models: BAGEL and FLUX. Each subfigure reports semantic consistency (G_SC), perceptual quality (G_PQ), and overall score (G_O) versus speedup.
Figure 3: Comparison of the Video generation for the HunYuanVideo model. We report the VBench score an the BRISQUE metric for the quality of frames generated.
Figure 4: L1-relative error between consecutive velocity predictions in the BAGEL model. While the trajectories may appear constant at intermediate scales, a finer analysis uncovers subtle yet systematic variations, indicating that the underlying dynamics are not strictly stable.
Figure 5: Generated instance for image generation task for BAGEL and FLUX models.
...and 6 more figures

Theorems & Definitions (3)

Theorem 3.1
Theorem B.1
proof

FastFlow: Accelerating The Generative Flow Matching Models with Bandit Inference

TL;DR

Abstract

FastFlow: Accelerating The Generative Flow Matching Models with Bandit Inference

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (3)