Table of Contents
Fetching ...

Iterative Distillation for Reward-Guided Fine-Tuning of Diffusion Models in Biomolecular Design

Xingyu Su, Xiner Li, Masatoshi Uehara, Sunwoo Kim, Yulai Zhao, Gabriele Scalia, Ehsan Hajiramezanali, Tommaso Biancalani, Degui Zhi, Shuiwang Ji

TL;DR

This work proposes an iterative distillation-based fine-tuning framework that enables diffusion models to optimize for arbitrary reward functions and demonstrates the effectiveness and superior reward optimization of this approach across diverse tasks in protein, small molecule, and regulatory DNA design.

Abstract

We address the problem of fine-tuning diffusion models for reward-guided generation in biomolecular design. While diffusion models have proven highly effective in modeling complex, high-dimensional data distributions, real-world applications often demand more than high-fidelity generation, requiring optimization with respect to potentially non-differentiable reward functions such as physics-based simulation or rewards based on scientific knowledge. Although RL methods have been explored to fine-tune diffusion models for such objectives, they often suffer from instability, low sample efficiency, and mode collapse due to their on-policy nature. In this work, we propose an iterative distillation-based fine-tuning framework that enables diffusion models to optimize for arbitrary reward functions. Our method casts the problem as policy distillation: it collects off-policy data during the roll-in phase, simulates reward-based soft-optimal policies during roll-out, and updates the model by minimizing the KL divergence between the simulated soft-optimal policy and the current model policy. Our off-policy formulation, combined with KL divergence minimization, enhances training stability and sample efficiency compared to existing RL-based methods. Empirical results demonstrate the effectiveness and superior reward optimization of our approach across diverse tasks in protein, small molecule, and regulatory DNA design. The source code is released at (https://divelab.github.io/VIDD/).

Iterative Distillation for Reward-Guided Fine-Tuning of Diffusion Models in Biomolecular Design

TL;DR

This work proposes an iterative distillation-based fine-tuning framework that enables diffusion models to optimize for arbitrary reward functions and demonstrates the effectiveness and superior reward optimization of this approach across diverse tasks in protein, small molecule, and regulatory DNA design.

Abstract

We address the problem of fine-tuning diffusion models for reward-guided generation in biomolecular design. While diffusion models have proven highly effective in modeling complex, high-dimensional data distributions, real-world applications often demand more than high-fidelity generation, requiring optimization with respect to potentially non-differentiable reward functions such as physics-based simulation or rewards based on scientific knowledge. Although RL methods have been explored to fine-tune diffusion models for such objectives, they often suffer from instability, low sample efficiency, and mode collapse due to their on-policy nature. In this work, we propose an iterative distillation-based fine-tuning framework that enables diffusion models to optimize for arbitrary reward functions. Our method casts the problem as policy distillation: it collects off-policy data during the roll-in phase, simulates reward-based soft-optimal policies during roll-out, and updates the model by minimizing the KL divergence between the simulated soft-optimal policy and the current model policy. Our off-policy formulation, combined with KL divergence minimization, enhances training stability and sample efficiency compared to existing RL-based methods. Empirical results demonstrate the effectiveness and superior reward optimization of our approach across diverse tasks in protein, small molecule, and regulatory DNA design. The source code is released at (https://divelab.github.io/VIDD/).

Paper Structure

This paper contains 67 sections, 1 theorem, 23 equations, 7 figures, 24 tables, 1 algorithm.

Key Result

Theorem 1

Denote $p^{\theta}_{0:T} \in \mathcal{X}\times \cdots \mathcal{X}$ as the induced joint distribution from $t=T$ to $t=0$ by $\{p^{\theta}_t\}$ and denote $p^{\star}_{0:T}$ as the corresponding distribution induced by the soft-optimal policy. Then,

Figures (7)

  • Figure 1: Overview of VIDD. VIDD fine-tunes diffusion models to maximize potentially non-differentiable rewards by iteratively distilling soft-optimal denoising policies. It alternates between (1) off-policy roll-in, (2) value-guided reward-weighted roll-out, and (3) forward KL-based model updates. Our algorithm leverages off-policy roll-ins and forward KL minimization rather, which contribute to improved optimization stability.
  • Figure 2: Protein structure visualizations for the PD-L1 and IFNAR2 binding design tasks. The binder protein is shown in green and target protein is in orange, with hotspot residues labeled on the structure.
  • Figure 3: Protein structure visualizations for protein SS-match tasks.
  • Figure 4: Performance vs. diversity in PD-L1 binder design under different roll-in mixtures. Mixing in the pre-trained policy during roll-in (smaller $\beta_s$) increases diversity compared to relying solely on the roll-out policy ($\beta_s{=}1$).
  • Figure 5: Training curves of different methods on SS-match, PD-L1, and IFNAR2 binder design tasks. The $y$-axis shows the optimized reward, and the $x$-axis shows training steps.
  • ...and 2 more figures

Theorems & Definitions (1)

  • Theorem 1