Table of Contents
Fetching ...

Reward-Guided Iterative Refinement in Diffusion Models at Test-Time with Applications to Protein and DNA Design

Masatoshi Uehara, Xingyu Su, Yulai Zhao, Xiner Li, Aviv Regev, Shuiwang Ji, Sergey Levine, Tommaso Biancalani

TL;DR

This work introduces Reward-Guided Iterative Refinement in Diffusion Models (RERD), a test-time framework that progressively improves outputs by alternating partial noising with reward-guided denoising. It provides a theoretical guarantee that the iterative process targets the distribution $p^{(\alpha)}(\cdot) \propto \exp\left(\frac{r(\cdot)}{\alpha}\right) p^{\mathrm{pre}}(\cdot)$, combining pre-trained diffusion models with reward signals for complex design tasks. The authors instantiate a practical version that blends local importance sampling and global resampling, drawing connections to evolutionary algorithms to enable constraint handling and refinement. Empirical results on protein and cell-type–specific DNA design show that RERD outperforms single-shot baselines across multiple reward metrics while preserving model naturalness. The work highlights the potential of iterative, inference-time refinement to unlock more expressive reward optimization in diffusion-based design pipelines.

Abstract

To fully leverage the capabilities of diffusion models, we are often interested in optimizing downstream reward functions during inference. While numerous algorithms for reward-guided generation have been recently proposed due to their significance, current approaches predominantly focus on single-shot generation, transitioning from fully noised to denoised states. We propose a novel framework for inference-time reward optimization with diffusion models inspired by evolutionary algorithms. Our approach employs an iterative refinement process consisting of two steps in each iteration: noising and reward-guided denoising. This sequential refinement allows for the gradual correction of errors introduced during reward optimization. Besides, we provide a theoretical guarantee for our framework. Finally, we demonstrate its superior empirical performance in protein and cell-type-specific regulatory DNA design. The code is available at \href{https://github.com/masa-ue/ProDifEvo-Refinement}{https://github.com/masa-ue/ProDifEvo-Refinement}.

Reward-Guided Iterative Refinement in Diffusion Models at Test-Time with Applications to Protein and DNA Design

TL;DR

This work introduces Reward-Guided Iterative Refinement in Diffusion Models (RERD), a test-time framework that progressively improves outputs by alternating partial noising with reward-guided denoising. It provides a theoretical guarantee that the iterative process targets the distribution , combining pre-trained diffusion models with reward signals for complex design tasks. The authors instantiate a practical version that blends local importance sampling and global resampling, drawing connections to evolutionary algorithms to enable constraint handling and refinement. Empirical results on protein and cell-type–specific DNA design show that RERD outperforms single-shot baselines across multiple reward metrics while preserving model naturalness. The work highlights the potential of iterative, inference-time refinement to unlock more expressive reward optimization in diffusion-based design pipelines.

Abstract

To fully leverage the capabilities of diffusion models, we are often interested in optimizing downstream reward functions during inference. While numerous algorithms for reward-guided generation have been recently proposed due to their significance, current approaches predominantly focus on single-shot generation, transitioning from fully noised to denoised states. We propose a novel framework for inference-time reward optimization with diffusion models inspired by evolutionary algorithms. Our approach employs an iterative refinement process consisting of two steps in each iteration: noising and reward-guided denoising. This sequential refinement allows for the gradual correction of errors introduced during reward optimization. Besides, we provide a theoretical guarantee for our framework. Finally, we demonstrate its superior empirical performance in protein and cell-type-specific regulatory DNA design. The code is available at \href{https://github.com/masa-ue/ProDifEvo-Refinement}{https://github.com/masa-ue/ProDifEvo-Refinement}.

Paper Structure

This paper contains 48 sections, 1 theorem, 15 equations, 14 figures, 4 tables, 2 algorithms.

Key Result

Theorem 1

Suppose (a) the initial design $x^{\langle 0 \rangle}_0$ follows $p^{(\alpha)}$ (defined in eq:goal), (b) the marginal distributions induced by the forward noising process match those of the learned noising process in the pre-trained diffusion models. Then, the output $x^{\langle S \rangle}_0$ from

Figures (14)

  • Figure 1: Our proposed framework follows an iterative process, with each iteration injecting noise into the sample and then denoising it while optimizing rewards. For sequences, this can be implemented via masked diffusion, initialized from pre-trained diffusion models (left). Our algorithm can continuously refine the outputs by gradually correcting errors introduced during reward-guided denoising, improving the design over successive iterations (middle). For instance, for the task of optimizing the similarity (RMSD) of a protein to a target structure (Red), we can progressively minimize the RMSD through refinement, optimizing the design from an initial (Orange) fit to a better final fit (Green), as shown on the right.
  • Figure 2: Existing reward-guided algorithms can be viewed as sequentially sampling from $x_T$ to $x_0$ following the soft optimal policy $\{p^{\star}_t\}_{t=T}^1$. The primary distinction among these algorithms lies in how $p^{\star}_t$ is approximated.
  • Figure 3: Summary of RERD: We instantiate it within masked diffusion models. It alternates reward-guided denoising and noising.
  • Figure 4: Visualization of alg:decoding2. A reward-guided denoising consists of two components: local value-weighted sampling for each sample (from $k=K$ to $k=1$) and global resampling among samples in a batch at $k=1$.
  • Figure 5: Generated proteins (Green) when optimizing ss-match are shown. Red represents the target secondary structures. The ss-match score for the left figure is 0.96, while for the right figure, it is 1.0.
  • ...and 9 more figures

Theorems & Definitions (3)

  • Example 1: Masked Diffusion Models
  • Theorem 1: Target Distribution of RERD
  • Remark 1