Test-Time Scaling of Diffusion Models via Noise Trajectory Search
Vignav Ramesh, Morteza Mardani
TL;DR
This work tackles the problem of test-time scaling for diffusion models by optimizing the noise trajectory over denoising steps, rather than simply increasing step count. It first casts diffusion sampling as a finite-horizon MDP with a terminal reward and then relaxes to a sequence of contextual bandits, enabling an ε-greedy search that balances global exploration at the extremes with local exploitation in the middle steps. The proposed method achieves state-of-the-art or competitive results on class-conditional ImageNet and text-to-image generation (Stable Diffusion) across non-differentiable rewards, often matching or exceeding Monte Carlo Tree Search while avoiding its computational burden. By providing a practical, gradient-free approach to noise-trajectory optimization, the paper offers a scalable avenue for improving diffusion-model outputs in real-world applications and highlights adaptive strategies across diffusion timesteps. The work thus advances test-time optimization for diffusion models with broad implications for controllable generation and evaluation under arbitrary reward criteria.
Abstract
The iterative and stochastic nature of diffusion models enables test-time scaling, whereby spending additional compute during denoising generates higher-fidelity samples. Increasing the number of denoising steps is the primary scaling axis, but this yields quickly diminishing returns. Instead optimizing the noise trajectory--the sequence of injected noise vectors--is promising, as the specific noise realizations critically affect sample quality; but this is challenging due to a high-dimensional search space, complex noise-outcome interactions, and costly trajectory evaluations. We address this by first casting diffusion as a Markov Decision Process (MDP) with a terminal reward, showing tree-search methods such as Monte Carlo tree search (MCTS) to be meaningful but impractical. To balance performance and efficiency, we then resort to a relaxation of MDP, where we view denoising as a sequence of independent contextual bandits. This allows us to introduce an $ε$-greedy search algorithm that globally explores at extreme timesteps and locally exploits during the intermediate steps where de-mixing occurs. Experiments on EDM and Stable Diffusion reveal state-of-the-art scores for class-conditioned/text-to-image generation, exceeding baselines by up to $164\%$ and matching/exceeding MCTS performance. To our knowledge, this is the first practical method for test-time noise trajectory optimization of arbitrary (non-differentiable) rewards.
