Table of Contents
Fetching ...

Inference-time Scaling of Diffusion Models through Classical Search

Xiangcheng Zhang, Haowei Lin, Haotian Ye, James Zou, Jianzhu Ma, Yitao Liang, Yilun Du

TL;DR

This work tackles inference-time scaling for diffusion models by recasting sampling as a classical search problem. It introduces a unified framework that combines BFS/DFS global search for diverse modes with annealed Langevin MCMC-based local search guided by a verifier, enabling high-quality outputs beyond the base model. Demonstrations across long-horizon planning, offline RL, and image generation show improved performance and efficiency, with a double-verifier strategy mitigating reward hacking. The approach offers a principled, scalable pathway to adapt diffusion-based outputs to varied test-time objectives without retraining.

Abstract

Classical search algorithms have long underpinned modern artificial intelligence. In this work, we tackle the challenge of inference-time control in diffusion models -- adapting generated outputs to meet diverse test-time objectives -- using principles from classical search. We propose a general framework that orchestrates local and global search to efficiently navigate the generative space. It employs a theoretically grounded local search via annealed Langevin MCMC and performs compute-efficient global exploration using breadth-first and depth-first tree search. We evaluate our approach on a range of challenging domains, including planning, offline reinforcement learning, and image generation. Across all tasks, we observe significant gains in both performance and efficiency. These results show that classical search provides a principled and practical foundation for inference-time scaling in diffusion models. Project page at https://diffusion-inference-scaling.github.io/.

Inference-time Scaling of Diffusion Models through Classical Search

TL;DR

This work tackles inference-time scaling for diffusion models by recasting sampling as a classical search problem. It introduces a unified framework that combines BFS/DFS global search for diverse modes with annealed Langevin MCMC-based local search guided by a verifier, enabling high-quality outputs beyond the base model. Demonstrations across long-horizon planning, offline RL, and image generation show improved performance and efficiency, with a double-verifier strategy mitigating reward hacking. The approach offers a principled, scalable pathway to adapt diffusion-based outputs to varied test-time objectives without retraining.

Abstract

Classical search algorithms have long underpinned modern artificial intelligence. In this work, we tackle the challenge of inference-time control in diffusion models -- adapting generated outputs to meet diverse test-time objectives -- using principles from classical search. We propose a general framework that orchestrates local and global search to efficiently navigate the generative space. It employs a theoretically grounded local search via annealed Langevin MCMC and performs compute-efficient global exploration using breadth-first and depth-first tree search. We evaluate our approach on a range of challenging domains, including planning, offline reinforcement learning, and image generation. Across all tasks, we observe significant gains in both performance and efficiency. These results show that classical search provides a principled and practical foundation for inference-time scaling in diffusion models. Project page at https://diffusion-inference-scaling.github.io/.

Paper Structure

This paper contains 74 sections, 2 theorems, 48 equations, 9 figures, 10 tables, 5 algorithms.

Key Result

Proposition 1

In the continuous limit where the number of diffusion denoising steps $T\rightarrow\infty$, training-free guidance with recurrence is equivalent to running Langevin MCMC on a series of annealed distributions $\left\{ \Tilde{q}_t({\bm{x}}_t) \right\}_{t=0}^T$, with $\Tilde{q}_0({\bm{x}}_0)=\tilde{p}_

Figures (9)

  • Figure 1: Illustration of our search framework. Bottom left: direct sampling results in samples with low verifier scores. Bottom middle: global search identifies high score modes within the base distribution. Bottom right: local search further optimizes the samples for higher quality, driven by the gradient signal.
  • Figure 2: Illustration of global tree search algorithms.
  • Figure 3: CompBench text-to-image results with DFS.Left: the Pareto curve of DFS, with DFS-$\delta$ denotes DFS with threshold $\delta_t=\delta$. Right: average compute allocation by DFS for prompts of increasing difficulty.
  • Figure 4: Pareto curves of local search.Left: Pareto curves of best-of-N with different local search steps, where BoN-$i$ denotes $i$ local search steps. Right: Pareto curves of BFS and DFS with 6 local search steps.
  • Figure 5: Illustration of Maze layout and task, with failed trajectory (left) and successfual sample (right).
  • ...and 4 more figures

Theorems & Definitions (3)

  • Proposition 1
  • Theorem 1
  • proof