FastFlow: Accelerating The Generative Flow Matching Models with Bandit Inference
Divya Jyoti Bajpai, Dhruv Bhardwaj, Soumya Roy, Tejas Duseja, Harsh Agarwal, Aashay Sandansing, Manjesh Kumar Hanawal
TL;DR
Flow-matching models achieve high fidelity but suffer from slow, sequential inference due to denoising along a trajectory. FastFlow is a training-free adaptive inference framework that uses a per-timestep multi-armed bandit to decide when to skip steps, with velocity extrapolation via a first-order Taylor update and finite-difference from past predictions; the final-state error is bounded by $e_T = O(|S|/T^3)$. The paper contributes a theoretical error bound, a practical MAB-based adaptive mechanism, and empirical 2.6x+ speedups across image, video, and editing tasks while maintaining quality. It is plug-and-play and generalizes across FM-based models and tasks, enabling real-time generation on constrained hardware.
Abstract
Flow-matching models deliver state-of-the-art fidelity in image and video generation, but the inherent sequential denoising process renders them slower. Existing acceleration methods like distillation, trajectory truncation, and consistency approaches are static, require retraining, and often fail to generalize across tasks. We propose FastFlow, a plug-and-play adaptive inference framework that accelerates generation in flow matching models. FastFlow identifies denoising steps that produce only minor adjustments to the denoising path and approximates them without using the full neural network models used for velocity predictions. The approximation utilizes finite-difference velocity estimates from prior predictions to efficiently extrapolate future states, enabling faster advancements along the denoising path at zero compute cost. This enables skipping computation at intermediary steps. We model the decision of how many steps to safely skip before requiring a full model computation as a multi-armed bandit problem. The bandit learns the optimal skips to balance speed with performance. FastFlow integrates seamlessly with existing pipelines and generalizes across image generation, video generation, and editing tasks. Experiments demonstrate a speedup of over 2.6x while maintaining high-quality outputs. The source code for this work can be found at https://github.com/Div290/FastFlow.
