Table of Contents
Fetching ...

Dynamic Search for Inference-Time Alignment in Diffusion Models

Xiner Li, Masatoshi Uehara, Xingyu Su, Gabriele Scalia, Tommaso Biancalani, Aviv Regev, Sergey Levine, Shuiwang Ji

TL;DR

This work tackles the problem of aligning diffusion-generated outputs to non-differentiable reward signals during inference. It reframes inference-time alignment as a reward-driven search over denoising trajectories and introduces Dynamic Search for Diffusion (DSearch), which dynamically allocates computation through beam-width and tree-width adaptation, plus a lookahead heuristic. The method is instantiated with a pruning strategy using pre-trained policies, node-level heuristics, and an efficient lookahead estimation, and is validated across image, biological sequence, and molecular design tasks, showing superior reward optimization while preserving diversity and near-original likelihood. Overall, DSearch offers a scalable, gradient-free, inference-time alignment framework with practical impact for reward-guided generation in scientific domains where differentiable rewards are unavailable or unreliable.

Abstract

Diffusion models have shown promising generative capabilities across diverse domains, yet aligning their outputs with desired reward functions remains a challenge, particularly in cases where reward functions are non-differentiable. Some gradient-free guidance methods have been developed, but they often struggle to achieve optimal inference-time alignment. In this work, we newly frame inference-time alignment in diffusion as a search problem and propose Dynamic Search for Diffusion (DSearch), which subsamples from denoising processes and approximates intermediate node rewards. It also dynamically adjusts beam width and tree expansion to efficiently explore high-reward generations. To refine intermediate decisions, DSearch incorporates adaptive scheduling based on noise levels and a lookahead heuristic function. We validate DSearch across multiple domains, including biological sequence design, molecular optimization, and image generation, demonstrating superior reward optimization compared to existing approaches.

Dynamic Search for Inference-Time Alignment in Diffusion Models

TL;DR

This work tackles the problem of aligning diffusion-generated outputs to non-differentiable reward signals during inference. It reframes inference-time alignment as a reward-driven search over denoising trajectories and introduces Dynamic Search for Diffusion (DSearch), which dynamically allocates computation through beam-width and tree-width adaptation, plus a lookahead heuristic. The method is instantiated with a pruning strategy using pre-trained policies, node-level heuristics, and an efficient lookahead estimation, and is validated across image, biological sequence, and molecular design tasks, showing superior reward optimization while preserving diversity and near-original likelihood. Overall, DSearch offers a scalable, gradient-free, inference-time alignment framework with practical impact for reward-guided generation in scientific domains where differentiable rewards are unavailable or unreliable.

Abstract

Diffusion models have shown promising generative capabilities across diverse domains, yet aligning their outputs with desired reward functions remains a challenge, particularly in cases where reward functions are non-differentiable. Some gradient-free guidance methods have been developed, but they often struggle to achieve optimal inference-time alignment. In this work, we newly frame inference-time alignment in diffusion as a search problem and propose Dynamic Search for Diffusion (DSearch), which subsamples from denoising processes and approximates intermediate node rewards. It also dynamically adjusts beam width and tree expansion to efficiently explore high-reward generations. To refine intermediate decisions, DSearch incorporates adaptive scheduling based on noise levels and a lookahead heuristic function. We validate DSearch across multiple domains, including biological sequence design, molecular optimization, and image generation, demonstrating superior reward optimization compared to existing approaches.

Paper Structure

This paper contains 50 sections, 14 equations, 31 figures, 4 tables, 3 algorithms.

Figures (31)

  • Figure 1: Inference-time alignment of diffusion model as a search problem. We propose a dynamic search to maximize rewards efficiently and effectively. The top-down process visualizes the diffusion denoising trajectory starting from Gaussian noise down to the final sample $x_0$. Green circles indicate tree nodes, representing candidate samples at a time step, while darker nodes mark higher potential rewards. Red slashes denote selections, while nodes without selected children are pruned branches (suboptimal candidates eliminated during search). Blue arrows trace the final high-reward trajectory dynamically selected to maximize the downstream reward under computational budgets.
  • Figure 2: Illustration of DSearch. Our proposed dynamic search has expanding tree widths. We dynamically adjust weaker beams and reallocate their computational resources to other beams across time steps, fixing $w(t)b(t)$ while strategically scheduling $b(t)$.
  • Figure 3: Generated samples from DSearch. For more samples, please refer to app: vis. Note that the surfaces and ribbons in (e) are representations of the target proteins, while the generated small molecules are displayed in the center.
  • Figure 4: Reward (median & standard deviation) under different constraints $\bar{C}$.
  • Figure 5: Reward distributions of generated samples using DSearch with different scheduling algorithms. We fix $\bar{C}=40$ for DNA task and $\bar{C}=20$ for molecular task. For search scheduling, "all" has $|\mathcal{A}|=T$ while other algorithms have $|\mathcal{A}|/T=65\%\pm 1\%$. For beam scheduling, we use $\frac{b(T)}{b(0)}=4$ for different algorithms except "None", which does not use beam reduction.
  • ...and 26 more figures

Theorems & Definitions (1)

  • Remark 2.1: Parametrization