Table of Contents
Fetching ...

Improving Inverse Folding for Peptide Design with Diversity-regularized Direct Preference Optimization

Ryan Park, Darren J. Hsu, C. Brian Roland, Maria Korshunova, Chen Tessler, Shie Mannor, Olivia Viessmann, Bruno Trentini

TL;DR

This work fine-tune ProteinMPNN to produce diverse and structurally consistent peptide sequences via Direct Preference Optimization (DPO), and derives two enhancements to DPO: online diversity regularization and domain-specific priors.

Abstract

Inverse folding models play an important role in structure-based design by predicting amino acid sequences that fold into desired reference structures. Models like ProteinMPNN, a message-passing encoder-decoder model, are trained to reliably produce new sequences from a reference structure. However, when applied to peptides, these models are prone to generating repetitive sequences that do not fold into the reference structure. To address this, we fine-tune ProteinMPNN to produce diverse and structurally consistent peptide sequences via Direct Preference Optimization (DPO). We derive two enhancements to DPO: online diversity regularization and domain-specific priors. Additionally, we develop a new understanding on improving diversity in decoder models. When conditioned on OpenFold generated structures, our fine-tuned models achieve state-of-the-art structural similarity scores, improving base ProteinMPNN by at least 8%. Compared to standard DPO, our regularized method achieves up to 20% higher sequence diversity with no loss in structural similarity score.

Improving Inverse Folding for Peptide Design with Diversity-regularized Direct Preference Optimization

TL;DR

This work fine-tune ProteinMPNN to produce diverse and structurally consistent peptide sequences via Direct Preference Optimization (DPO), and derives two enhancements to DPO: online diversity regularization and domain-specific priors.

Abstract

Inverse folding models play an important role in structure-based design by predicting amino acid sequences that fold into desired reference structures. Models like ProteinMPNN, a message-passing encoder-decoder model, are trained to reliably produce new sequences from a reference structure. However, when applied to peptides, these models are prone to generating repetitive sequences that do not fold into the reference structure. To address this, we fine-tune ProteinMPNN to produce diverse and structurally consistent peptide sequences via Direct Preference Optimization (DPO). We derive two enhancements to DPO: online diversity regularization and domain-specific priors. Additionally, we develop a new understanding on improving diversity in decoder models. When conditioned on OpenFold generated structures, our fine-tuned models achieve state-of-the-art structural similarity scores, improving base ProteinMPNN by at least 8%. Compared to standard DPO, our regularized method achieves up to 20% higher sequence diversity with no loss in structural similarity score.

Paper Structure

This paper contains 19 sections, 13 equations, 7 figures, 4 tables, 2 algorithms.

Figures (7)

  • Figure 1: Motivation for DPO design choices. Left. Frequency of amino acid across ProteinMPNN generations conditioned on the peptide train set, vs. frequency over the peptide sequences. Middle. When conditioned on the same structure, the diversity of sequences generated by base ProteinMPNN does not correlate with the diversity of sequences generated by fine-tuned ProteinMPNN. Right. Distribution of rank correlation coefficient between model log-probabilities and TM-score.
  • Figure 2: Exploring the effect of online diversity optimization. Left. Improvement in diversity across temperatures 0, 0.5, and 1.0, with and without random decoding order. Middle. Sampling distribution entropy (average negative log-probability over samples) over various $\alpha$ values. Right. Entropy of log-probability distribution over validation reference sequences across $\alpha$ sweep.
  • Figure 3: Pareto front and KL divergences. Left. Pareto front for various $\alpha$ values over temperature sweep. Middle. KL divergence from $\pi_\text{ref}$ for various $\alpha$ values. $\alpha=0$ has $\beta=0.5$, all other have $\beta=0.1$. Right. KL divergence for DPO with reward scaling ($\beta=0.1$) and without ($\beta=0.5$).
  • Figure 4: Sequence recovery with diversity optimization. Left. Stronger diversity regularization seems to hurt sequence recovery, though all fine-tuned models improve over ProteinMPNN. Middle. Diversity does not hurt the correlation between log-probabilities and TM-score. Right. Best-of-N sampling allows diverse models to achieve sequence recoveries comparable to standard models.
  • Figure 5: Reward scaling improves DPO. Left. Reward-scaled DPO is a Pareto improvement over standard DPO over a temperature sweep. Middle. Left axis is the KL divergence between the token frequencies in the peptide train set vs. model samples (lower is better), right axis is the fraction of non-repeating tokens (higher is better). Right. TM-score improvement over base DPO. Lower TM-score buckets contain structures for which base ProteinMPNN generated low-quality sequences.
  • ...and 2 more figures