Table of Contents
Fetching ...

Test-Time Scaling of Diffusion Models via Noise Trajectory Search

Vignav Ramesh, Morteza Mardani

TL;DR

This work tackles the problem of test-time scaling for diffusion models by optimizing the noise trajectory over denoising steps, rather than simply increasing step count. It first casts diffusion sampling as a finite-horizon MDP with a terminal reward and then relaxes to a sequence of contextual bandits, enabling an ε-greedy search that balances global exploration at the extremes with local exploitation in the middle steps. The proposed method achieves state-of-the-art or competitive results on class-conditional ImageNet and text-to-image generation (Stable Diffusion) across non-differentiable rewards, often matching or exceeding Monte Carlo Tree Search while avoiding its computational burden. By providing a practical, gradient-free approach to noise-trajectory optimization, the paper offers a scalable avenue for improving diffusion-model outputs in real-world applications and highlights adaptive strategies across diffusion timesteps. The work thus advances test-time optimization for diffusion models with broad implications for controllable generation and evaluation under arbitrary reward criteria.

Abstract

The iterative and stochastic nature of diffusion models enables test-time scaling, whereby spending additional compute during denoising generates higher-fidelity samples. Increasing the number of denoising steps is the primary scaling axis, but this yields quickly diminishing returns. Instead optimizing the noise trajectory--the sequence of injected noise vectors--is promising, as the specific noise realizations critically affect sample quality; but this is challenging due to a high-dimensional search space, complex noise-outcome interactions, and costly trajectory evaluations. We address this by first casting diffusion as a Markov Decision Process (MDP) with a terminal reward, showing tree-search methods such as Monte Carlo tree search (MCTS) to be meaningful but impractical. To balance performance and efficiency, we then resort to a relaxation of MDP, where we view denoising as a sequence of independent contextual bandits. This allows us to introduce an $ε$-greedy search algorithm that globally explores at extreme timesteps and locally exploits during the intermediate steps where de-mixing occurs. Experiments on EDM and Stable Diffusion reveal state-of-the-art scores for class-conditioned/text-to-image generation, exceeding baselines by up to $164\%$ and matching/exceeding MCTS performance. To our knowledge, this is the first practical method for test-time noise trajectory optimization of arbitrary (non-differentiable) rewards.

Test-Time Scaling of Diffusion Models via Noise Trajectory Search

TL;DR

This work tackles the problem of test-time scaling for diffusion models by optimizing the noise trajectory over denoising steps, rather than simply increasing step count. It first casts diffusion sampling as a finite-horizon MDP with a terminal reward and then relaxes to a sequence of contextual bandits, enabling an ε-greedy search that balances global exploration at the extremes with local exploitation in the middle steps. The proposed method achieves state-of-the-art or competitive results on class-conditional ImageNet and text-to-image generation (Stable Diffusion) across non-differentiable rewards, often matching or exceeding Monte Carlo Tree Search while avoiding its computational burden. By providing a practical, gradient-free approach to noise-trajectory optimization, the paper offers a scalable avenue for improving diffusion-model outputs in real-world applications and highlights adaptive strategies across diffusion timesteps. The work thus advances test-time optimization for diffusion models with broad implications for controllable generation and evaluation under arbitrary reward criteria.

Abstract

The iterative and stochastic nature of diffusion models enables test-time scaling, whereby spending additional compute during denoising generates higher-fidelity samples. Increasing the number of denoising steps is the primary scaling axis, but this yields quickly diminishing returns. Instead optimizing the noise trajectory--the sequence of injected noise vectors--is promising, as the specific noise realizations critically affect sample quality; but this is challenging due to a high-dimensional search space, complex noise-outcome interactions, and costly trajectory evaluations. We address this by first casting diffusion as a Markov Decision Process (MDP) with a terminal reward, showing tree-search methods such as Monte Carlo tree search (MCTS) to be meaningful but impractical. To balance performance and efficiency, we then resort to a relaxation of MDP, where we view denoising as a sequence of independent contextual bandits. This allows us to introduce an -greedy search algorithm that globally explores at extreme timesteps and locally exploits during the intermediate steps where de-mixing occurs. Experiments on EDM and Stable Diffusion reveal state-of-the-art scores for class-conditioned/text-to-image generation, exceeding baselines by up to and matching/exceeding MCTS performance. To our knowledge, this is the first practical method for test-time noise trajectory optimization of arbitrary (non-differentiable) rewards.

Paper Structure

This paper contains 52 sections, 1 theorem, 11 equations, 8 figures, 11 tables, 7 algorithms.

Key Result

Theorem 1

Let the noise at diffusion step $t$ be $\mathbf{z}_t \sim \mathcal{N}(\mathbf{0}, \sigma_t^2\mathbf{I}_d)$. Fix a confidence level $\eta$ and truncate noise-space to the high‑probability ball $\Gamma_t = \left\{ \mathbf{z} \in \mathbb{R}^d : \lvert \mathbf{z} \rvert \leq \sigma_t \sqrt{d \ln (1/\eta Run Algorithm 1 ($\epsilon$-greedy local search) for $K$ iterations with $N$ candidate points per i

Figures (8)

  • Figure 1: (Left) Implicit denoising tree traversed by search algorithms. (Right) Visualization of local search in noise space to maximize reward at a single timestep ${t}$.
  • Figure 2: (Left) Average local-search iteration at which a random Normal candidate is chosen ($\bar{k}$) as a function of $\sigma_t$. (Right) Estimated Lipschitz constant of the ImageNet reward as a function of $\mathbf{x}_t$, across $\sigma_t$ values.
  • Figure 3: Performance scaling with number of noise candidates. Performance of sampling methods as the number of noise candidates $N$ per timestep increases, measured on three reward functions (brightness, compressibility, ImageNet). $\epsilon$-greedy achieves the best scaling law, attaining the highest rewards at the majority of $N$ values; however, it experiences non-monotonic gains due to its greedy local selection, whereas MCTS—by approximating exhaustive search—shows steady, monotonic improvement.
  • Figure 4: EDM results by sampling method, varying $\mathbb{E}_t[K_t]$. We use the same class labels and (other than $K$) algorithm hyperparameters as in Table 1 of the main paper.
  • Figure 5: (Left)Average reward vs. number of search iterations per timestep.(Center)Sweep over $\epsilon$ for $\epsilon$-greedy. Highlights an optimal $\epsilon\approx 0.4$ that balances exploration and exploitation. (Right)Reward vs. maximum step size scaling factor $\lambda$ for zero-order and $\epsilon$-greedy search. Demonstrates optimal $\lambda \in [0.1,0.2]$ and drastic performance drops at large $\lambda$, due to candidate set being overwhelmed by degenerate noise samples far outside the standard Normal distribution.
  • ...and 3 more figures

Theorems & Definitions (1)

  • Theorem 1: Regret of $\epsilon$-greedy search