Table of Contents
Fetching ...

Fine-Tuning Discrete Diffusion Models via Reward Optimization with Applications to DNA and Protein Design

Chenyu Wang, Masatoshi Uehara, Yichun He, Amy Wang, Tommaso Biancalani, Avantika Lal, Tommi Jaakkola, Sergey Levine, Hanchen Wang, Aviv Regev

TL;DR

This work tackles the challenge of generating natural-like discrete sequences while optimizing task-specific objectives. It introduces DRAKES, a method that fine-tunes pretrained discrete diffusion models by solving a reward-maximization objective with a KL penalty to preserve naturalness, and makes diffusion trajectories differentiable through the Gumbel-Softmax trick. The authors provide theoretical guarantees linking the learned distribution to a reward-weighted prior and demonstrate empirical superiority on enhancer DNA design and protein stability tasks, with ablations highlighting the beneficial role of the KL term. The approach addresses fundamental differences between CTMC-based discrete diffusion and continuous diffusion, enabling effective design in biology with potential for broader NLP and therapeutics applications. The results suggest substantial practical impact for generating functional, in-distribution sequences in gene therapy and protein engineering.

Abstract

Recent studies have demonstrated the strong empirical performance of diffusion models on discrete sequences across domains from natural language to biological sequence generation. For example, in the protein inverse folding task, conditional diffusion models have achieved impressive results in generating natural-like sequences that fold back into the original structure. However, practical design tasks often require not only modeling a conditional distribution but also optimizing specific task objectives. For instance, we may prefer protein sequences with high stability. To address this, we consider the scenario where we have pre-trained discrete diffusion models that can generate natural-like sequences, as well as reward models that map sequences to task objectives. We then formulate the reward maximization problem within discrete diffusion models, analogous to reinforcement learning (RL), while minimizing the KL divergence against pretrained diffusion models to preserve naturalness. To solve this RL problem, we propose a novel algorithm, DRAKES, that enables direct backpropagation of rewards through entire trajectories generated by diffusion models, by making the originally non-differentiable trajectories differentiable using the Gumbel-Softmax trick. Our theoretical analysis indicates that our approach can generate sequences that are both natural-like and yield high rewards. While similar tasks have been recently explored in diffusion models for continuous domains, our work addresses unique algorithmic and theoretical challenges specific to discrete diffusion models, which arise from their foundation in continuous-time Markov chains rather than Brownian motion. Finally, we demonstrate the effectiveness of DRAKES in generating DNA and protein sequences that optimize enhancer activity and protein stability, respectively, important tasks for gene therapies and protein-based therapeutics.

Fine-Tuning Discrete Diffusion Models via Reward Optimization with Applications to DNA and Protein Design

TL;DR

This work tackles the challenge of generating natural-like discrete sequences while optimizing task-specific objectives. It introduces DRAKES, a method that fine-tunes pretrained discrete diffusion models by solving a reward-maximization objective with a KL penalty to preserve naturalness, and makes diffusion trajectories differentiable through the Gumbel-Softmax trick. The authors provide theoretical guarantees linking the learned distribution to a reward-weighted prior and demonstrate empirical superiority on enhancer DNA design and protein stability tasks, with ablations highlighting the beneficial role of the KL term. The approach addresses fundamental differences between CTMC-based discrete diffusion and continuous diffusion, enabling effective design in biology with potential for broader NLP and therapeutics applications. The results suggest substantial practical impact for generating functional, in-distribution sequences in gene therapy and protein engineering.

Abstract

Recent studies have demonstrated the strong empirical performance of diffusion models on discrete sequences across domains from natural language to biological sequence generation. For example, in the protein inverse folding task, conditional diffusion models have achieved impressive results in generating natural-like sequences that fold back into the original structure. However, practical design tasks often require not only modeling a conditional distribution but also optimizing specific task objectives. For instance, we may prefer protein sequences with high stability. To address this, we consider the scenario where we have pre-trained discrete diffusion models that can generate natural-like sequences, as well as reward models that map sequences to task objectives. We then formulate the reward maximization problem within discrete diffusion models, analogous to reinforcement learning (RL), while minimizing the KL divergence against pretrained diffusion models to preserve naturalness. To solve this RL problem, we propose a novel algorithm, DRAKES, that enables direct backpropagation of rewards through entire trajectories generated by diffusion models, by making the originally non-differentiable trajectories differentiable using the Gumbel-Softmax trick. Our theoretical analysis indicates that our approach can generate sequences that are both natural-like and yield high rewards. While similar tasks have been recently explored in diffusion models for continuous domains, our work addresses unique algorithmic and theoretical challenges specific to discrete diffusion models, which arise from their foundation in continuous-time Markov chains rather than Brownian motion. Finally, we demonstrate the effectiveness of DRAKES in generating DNA and protein sequences that optimize enhancer activity and protein stability, respectively, important tasks for gene therapies and protein-based therapeutics.

Paper Structure

This paper contains 42 sections, 6 theorems, 51 equations, 6 figures, 14 tables, 1 algorithm.

Key Result

Theorem 1

When $\{Q^{\theta}_{\cdot,\cdot}:\theta \in \Theta \}$ is fully nonparametric (i.e., realizability holds), the generated distribution at time $T$ by eq:after_finetuned is proportional to

Figures (6)

  • Figure 1: DRAKES. We maximize the reward with a penalty term relative to pre-trained discrete diffusion models using the Gumbel-Softmax trick.
  • Figure 2: Examples of generated proteins. Red: Wild-type backbone structure (the one we condition on), Yellow: Structure predicted by ESMFold from the wild-type (true) sequence, Green: Structure predicted by ESMFold from the sequence generated by DRAKES. The structures for sequences generated by DRAKES show good alignment with the original structure (the scRMSDs are $0.768$ for 7JJK and $0.492$ for 2KRU). Histograms: Gibbs free energy for each generated sequence, calculated using physics-based simulations. In these two cases, the sequences generated by DRAKES appear to be more stable than the baselines.
  • Figure 3: Comparison of HepG2 activity distributions between original sequences and those generated by the pretrained model. The activity distributions match closely with each other.
  • Figure 4: 3-mer and 4-mer Pearson correlation between the original and generated sequences.
  • Figure 5: Distribution of Pred-Activity for the generated sequences of each method.
  • ...and 1 more figures

Theorems & Definitions (10)

  • Remark 1: Sequence of multiple tokens
  • Remark 2: Conditioning
  • Theorem 1: Fine-Tuned Distribution
  • Remark 3
  • Theorem 2: Optimal generator
  • Theorem 3: Feynman–Kac Formula in CTMC
  • Theorem 4: Marginal distribution induced by the optimal generator $Q^{\theta^{\star}}(t)$
  • Lemma 1: Kolmogorov backward equation
  • Lemma 2: Kolmogorov forward equation
  • Remark 4