Inference-Time Scaling for Flow Models via Stochastic Generation and Rollover Budget Forcing
Jaihoon Kim, Taehoon Yoon, Jisung Hwang, Minhyuk Sung
TL;DR
This work introduces inference-time scaling for pretrained flow models by (i) converting the deterministic flow dynamics into an SDE to enable particle sampling, (ii) replacing the linear interpolant with a VP interpolant to broaden the search space and boost diversity, and (iii) proposing Rollover Budget Forcing to adaptively allocate compute across timesteps. The combined VP-SDE and interpolant conversion substantially improve reward alignment for flow models on compositional and quantity-aware image generation tasks, with RBF delivering the strongest gains and synergistic benefits when rewards are differentiable. The results demonstrate that stochastic generation and adaptive compute strategies can close the gap between flow and diffusion models for inference-time scaling, enabling high-quality, aligned outputs with limited compute. The approach provides practical pathways to enhance controllability of flow-based generators in complex prompting scenarios while highlighting trade-offs in compute overhead and robustness to misuse.
Abstract
We propose an inference-time scaling approach for pretrained flow models. Recently, inference-time scaling has gained significant attention in LLMs and diffusion models, improving sample quality or better aligning outputs with user preferences by leveraging additional computation. For diffusion models, particle sampling has allowed more efficient scaling due to the stochasticity at intermediate denoising steps. On the contrary, while flow models have gained popularity as an alternative to diffusion models--offering faster generation and high-quality outputs in state-of-the-art image and video generative models--efficient inference-time scaling methods used for diffusion models cannot be directly applied due to their deterministic generative process. To enable efficient inference-time scaling for flow models, we propose three key ideas: 1) SDE-based generation, enabling particle sampling in flow models, 2) Interpolant conversion, broadening the search space and enhancing sample diversity, and 3) Rollover Budget Forcing (RBF), an adaptive allocation of computational resources across timesteps to maximize budget utilization. Our experiments show that SDE-based generation, particularly variance-preserving (VP) interpolant-based generation, improves the performance of particle sampling methods for inference-time scaling in flow models. Additionally, we demonstrate that RBF with VP-SDE achieves the best performance, outperforming all previous inference-time scaling approaches.
