Reward-Guided Iterative Refinement in Diffusion Models at Test-Time with Applications to Protein and DNA Design

Masatoshi Uehara; Xingyu Su; Yulai Zhao; Xiner Li; Aviv Regev; Shuiwang Ji; Sergey Levine; Tommaso Biancalani

Reward-Guided Iterative Refinement in Diffusion Models at Test-Time with Applications to Protein and DNA Design

Masatoshi Uehara, Xingyu Su, Yulai Zhao, Xiner Li, Aviv Regev, Shuiwang Ji, Sergey Levine, Tommaso Biancalani

TL;DR

This work introduces Reward-Guided Iterative Refinement in Diffusion Models (RERD), a test-time framework that progressively improves outputs by alternating partial noising with reward-guided denoising. It provides a theoretical guarantee that the iterative process targets the distribution $p^{(\alpha)}(\cdot) \propto \exp\left(\frac{r(\cdot)}{\alpha}\right) p^{\mathrm{pre}}(\cdot)$, combining pre-trained diffusion models with reward signals for complex design tasks. The authors instantiate a practical version that blends local importance sampling and global resampling, drawing connections to evolutionary algorithms to enable constraint handling and refinement. Empirical results on protein and cell-type–specific DNA design show that RERD outperforms single-shot baselines across multiple reward metrics while preserving model naturalness. The work highlights the potential of iterative, inference-time refinement to unlock more expressive reward optimization in diffusion-based design pipelines.

Abstract

To fully leverage the capabilities of diffusion models, we are often interested in optimizing downstream reward functions during inference. While numerous algorithms for reward-guided generation have been recently proposed due to their significance, current approaches predominantly focus on single-shot generation, transitioning from fully noised to denoised states. We propose a novel framework for inference-time reward optimization with diffusion models inspired by evolutionary algorithms. Our approach employs an iterative refinement process consisting of two steps in each iteration: noising and reward-guided denoising. This sequential refinement allows for the gradual correction of errors introduced during reward optimization. Besides, we provide a theoretical guarantee for our framework. Finally, we demonstrate its superior empirical performance in protein and cell-type-specific regulatory DNA design. The code is available at \href{https://github.com/masa-ue/ProDifEvo-Refinement}{https://github.com/masa-ue/ProDifEvo-Refinement}.

Reward-Guided Iterative Refinement in Diffusion Models at Test-Time with Applications to Protein and DNA Design

TL;DR

Abstract

Reward-Guided Iterative Refinement in Diffusion Models at Test-Time with Applications to Protein and DNA Design

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (14)

Theorems & Definitions (3)