Table of Contents
Fetching ...

Robust Optimization in Protein Fitness Landscapes Using Reinforcement Learning in Latent Space

Minji Lee, Luiz Felipe Vecchietti, Hyunkyu Jung, Hyun Joo Ro, Meeyoung Cha, Ho Min Kim

TL;DR

LatProtRL, an optimization method to efficiently traverse a latent space learned by an encoder-decoder leveraging a large protein language model, is proposed, modeled as a Markov decision process using reinforcement learning acting directly in latent space to escape local optima.

Abstract

Proteins are complex molecules responsible for different functions in nature. Enhancing the functionality of proteins and cellular fitness can significantly impact various industries. However, protein optimization using computational methods remains challenging, especially when starting from low-fitness sequences. We propose LatProtRL, an optimization method to efficiently traverse a latent space learned by an encoder-decoder leveraging a large protein language model. To escape local optima, our optimization is modeled as a Markov decision process using reinforcement learning acting directly in latent space. We evaluate our approach on two important fitness optimization tasks, demonstrating its ability to achieve comparable or superior fitness over baseline methods. Our findings and in vitro evaluation show that the generated sequences can reach high-fitness regions, suggesting a substantial potential of LatProtRL in lab-in-the-loop scenarios.

Robust Optimization in Protein Fitness Landscapes Using Reinforcement Learning in Latent Space

TL;DR

LatProtRL, an optimization method to efficiently traverse a latent space learned by an encoder-decoder leveraging a large protein language model, is proposed, modeled as a Markov decision process using reinforcement learning acting directly in latent space to escape local optima.

Abstract

Proteins are complex molecules responsible for different functions in nature. Enhancing the functionality of proteins and cellular fitness can significantly impact various industries. However, protein optimization using computational methods remains challenging, especially when starting from low-fitness sequences. We propose LatProtRL, an optimization method to efficiently traverse a latent space learned by an encoder-decoder leveraging a large protein language model. To escape local optima, our optimization is modeled as a Markov decision process using reinforcement learning acting directly in latent space. We evaluate our approach on two important fitness optimization tasks, demonstrating its ability to achieve comparable or superior fitness over baseline methods. Our findings and in vitro evaluation show that the generated sequences can reach high-fitness regions, suggesting a substantial potential of LatProtRL in lab-in-the-loop scenarios.
Paper Structure (37 sections, 8 figures, 8 tables, 4 algorithms)

This paper contains 37 sections, 8 figures, 8 tables, 4 algorithms.

Figures (8)

  • Figure 1: Overview of LatProtRL. At each round, an RL policy $\pi$ acts to collect trajectories $\mathcal{T}$ for a fixed number of episodes. After $\mathcal{T}$ are collected, the reward is calculated based on the feedback from an oracle. The trajectories with calculated rewards, $\mathcal{T}'$, are used to train the policy using an on-policy RL algorithm.
  • Figure 2: Variant Encoder-Decoder Architecture. Given an input sequence, the encoder calculates a representation that is used by the decoder to reconstruct the original sequence. The term CLS represents the embeddings for the classification token in ESM-2.
  • Figure 3: Evaluation metric by optimization round for LatProtRL, AdaLead and PEX. Shaded regions indicate the standard deviation of 5 runs. The x-axis indicates the number of rounds.
  • Figure 4: Optimization trajectories of LatProtRL and AdaLead in GFP hard task. The "original" term indicates the experimental rugged fitness landscape, exhibiting several local peaks. The x- and y-axis are obtained by multidimensional scaling (MDS) kruskal1964nonmetric of pairwise distances of 2500 sequences sampled from $\mathcal{D}^*$ with 16 optimized sequences from LatProtRL and AdaLead at each round. We chose the median of 16 sequences but observed a similar tendency for the top 16 sequences. AdaLead generates improved sequences at each round but farther from the experimental data distribution and with fitness values lower when compared to high-fitness sequences (See $\triangleright$ markers at Round 15). LatProtRL generates sequences closer to high-fitness sequences in the data distribution (See $\Diamond$ markers at Round 15) while also escaping local optima.
  • Figure 5: Effect of the calibrating steps to fitness and episode length. Calibrating steps allow the policy to learn actions leading to less than $m_{\text{step}}$ mutations and increasing the length of the episodes during training.
  • ...and 3 more figures