Table of Contents
Fetching ...

Reinforcement Learning for Sequence Design Leveraging Protein Language Models

Jithendaraa Subramanian, Shivakanth Sujit, Niloy Irtisam, Umong Sain, Riashat Islam, Derek Nowrouzezahrai, Samira Ebrahimi Kahou

TL;DR

This work proposes an alternative paradigm where optimization can be performed on scores from a smaller proxy model that is periodically finetuned, jointly while learning the mutation policy, and provides a modular open source implementation that can be easily integrated in most RL training loops.

Abstract

Protein sequence design, determined by amino acid sequences, are essential to protein engineering problems in drug discovery. Prior approaches have resorted to evolutionary strategies or Monte-Carlo methods for protein design, but often fail to exploit the structure of the combinatorial search space, to generalize to unseen sequences. In the context of discrete black box optimization over large search spaces, learning a mutation policy to generate novel sequences with reinforcement learning is appealing. Recent advances in protein language models (PLMs) trained on large corpora of protein sequences offer a potential solution to this problem by scoring proteins according to their biological plausibility (such as the TM-score). In this work, we propose to use PLMs as a reward function to generate new sequences. Yet the PLM can be computationally expensive to query due to its large size. To this end, we propose an alternative paradigm where optimization can be performed on scores from a smaller proxy model that is periodically finetuned, jointly while learning the mutation policy. We perform extensive experiments on various sequence lengths to benchmark RL-based approaches, and provide comprehensive evaluations along biological plausibility and diversity of the protein. Our experimental results include favorable evaluations of the proposed sequences, along with high diversity scores, demonstrating that RL is a strong candidate for biological sequence design. Finally, we provide a modular open source implementation can be easily integrated in most RL training loops, with support for replacing the reward model with other PLMs, to spur further research in this domain. The code for all experiments is provided in the supplementary material.

Reinforcement Learning for Sequence Design Leveraging Protein Language Models

TL;DR

This work proposes an alternative paradigm where optimization can be performed on scores from a smaller proxy model that is periodically finetuned, jointly while learning the mutation policy, and provides a modular open source implementation that can be easily integrated in most RL training loops.

Abstract

Protein sequence design, determined by amino acid sequences, are essential to protein engineering problems in drug discovery. Prior approaches have resorted to evolutionary strategies or Monte-Carlo methods for protein design, but often fail to exploit the structure of the combinatorial search space, to generalize to unseen sequences. In the context of discrete black box optimization over large search spaces, learning a mutation policy to generate novel sequences with reinforcement learning is appealing. Recent advances in protein language models (PLMs) trained on large corpora of protein sequences offer a potential solution to this problem by scoring proteins according to their biological plausibility (such as the TM-score). In this work, we propose to use PLMs as a reward function to generate new sequences. Yet the PLM can be computationally expensive to query due to its large size. To this end, we propose an alternative paradigm where optimization can be performed on scores from a smaller proxy model that is periodically finetuned, jointly while learning the mutation policy. We perform extensive experiments on various sequence lengths to benchmark RL-based approaches, and provide comprehensive evaluations along biological plausibility and diversity of the protein. Our experimental results include favorable evaluations of the proposed sequences, along with high diversity scores, demonstrating that RL is a strong candidate for biological sequence design. Finally, we provide a modular open source implementation can be easily integrated in most RL training loops, with support for replacing the reward model with other PLMs, to spur further research in this domain. The code for all experiments is provided in the supplementary material.
Paper Structure (19 sections, 5 equations, 7 figures, 11 tables)

This paper contains 19 sections, 5 equations, 7 figures, 11 tables.

Figures (7)

  • Figure 1: Training policies with the a) oracle, b) proxy reward models.
  • Figure 2: Pearson correlation score between the proxy and oracle model. For readability, the curves are smoothed with exponential moving average.
  • Figure 3: Learning curves for optimizing the reward (pTM) from ESMFold on sequences of length 50. The x-axis is the number of queries to the reward model.
  • Figure 4: Learning curves for optimizing the reward from the proxy model on sequence length $50$. Both the proxy pTM and oracle pTM are shown, with the x-axis referring to number of proxy reward queries.
  • Figure 5: Pareto plots for optimization on the finetuned proxy scores: Plotted across two structural diversity metrics and one sequence diversity metric (MP-HD). Only methods with atleast $0.5$ pTM are highlighted. PPO, SAC, and GFlowNets form the Pareto front across all 3 diversity metrics.
  • ...and 2 more figures