Robust Optimization in Protein Fitness Landscapes Using Reinforcement Learning in Latent Space

Minji Lee; Luiz Felipe Vecchietti; Hyunkyu Jung; Hyun Joo Ro; Meeyoung Cha; Ho Min Kim

Robust Optimization in Protein Fitness Landscapes Using Reinforcement Learning in Latent Space

Minji Lee, Luiz Felipe Vecchietti, Hyunkyu Jung, Hyun Joo Ro, Meeyoung Cha, Ho Min Kim

TL;DR

LatProtRL, an optimization method to efficiently traverse a latent space learned by an encoder-decoder leveraging a large protein language model, is proposed, modeled as a Markov decision process using reinforcement learning acting directly in latent space to escape local optima.

Abstract

Proteins are complex molecules responsible for different functions in nature. Enhancing the functionality of proteins and cellular fitness can significantly impact various industries. However, protein optimization using computational methods remains challenging, especially when starting from low-fitness sequences. We propose LatProtRL, an optimization method to efficiently traverse a latent space learned by an encoder-decoder leveraging a large protein language model. To escape local optima, our optimization is modeled as a Markov decision process using reinforcement learning acting directly in latent space. We evaluate our approach on two important fitness optimization tasks, demonstrating its ability to achieve comparable or superior fitness over baseline methods. Our findings and in vitro evaluation show that the generated sequences can reach high-fitness regions, suggesting a substantial potential of LatProtRL in lab-in-the-loop scenarios.

Robust Optimization in Protein Fitness Landscapes Using Reinforcement Learning in Latent Space

TL;DR

Abstract

Paper Structure (37 sections, 8 figures, 8 tables, 4 algorithms)

This paper contains 37 sections, 8 figures, 8 tables, 4 algorithms.

Introduction
Related Works
Protein Fitness Optimization
Methodology
Problem Formulation
Optimization in Latent Space
Variant Encoder-Decoder (VED)
Encoder Architecture
Decoder Architecture
Constrained Decoding
Protein Fitness Optimization via Model-Based RL
Frontier Buffer
Results
Experiment Setup
Datasets and Oracles
...and 22 more sections

Figures (8)

Figure 1: Overview of LatProtRL. At each round, an RL policy $\pi$ acts to collect trajectories $\mathcal{T}$ for a fixed number of episodes. After $\mathcal{T}$ are collected, the reward is calculated based on the feedback from an oracle. The trajectories with calculated rewards, $\mathcal{T}'$, are used to train the policy using an on-policy RL algorithm.
Figure 2: Variant Encoder-Decoder Architecture. Given an input sequence, the encoder calculates a representation that is used by the decoder to reconstruct the original sequence. The term CLS represents the embeddings for the classification token in ESM-2.
Figure 3: Evaluation metric by optimization round for LatProtRL, AdaLead and PEX. Shaded regions indicate the standard deviation of 5 runs. The x-axis indicates the number of rounds.
Figure 4: Optimization trajectories of LatProtRL and AdaLead in GFP hard task. The "original" term indicates the experimental rugged fitness landscape, exhibiting several local peaks. The x- and y-axis are obtained by multidimensional scaling (MDS) kruskal1964nonmetric of pairwise distances of 2500 sequences sampled from $\mathcal{D}^*$ with 16 optimized sequences from LatProtRL and AdaLead at each round. We chose the median of 16 sequences but observed a similar tendency for the top 16 sequences. AdaLead generates improved sequences at each round but farther from the experimental data distribution and with fitness values lower when compared to high-fitness sequences (See $\triangleright$ markers at Round 15). LatProtRL generates sequences closer to high-fitness sequences in the data distribution (See $\Diamond$ markers at Round 15) while also escaping local optima.
Figure 5: Effect of the calibrating steps to fitness and episode length. Calibrating steps allow the policy to learn actions leading to less than $m_{\text{step}}$ mutations and increasing the length of the episodes during training.
...and 3 more figures

Robust Optimization in Protein Fitness Landscapes Using Reinforcement Learning in Latent Space

TL;DR

Abstract

Robust Optimization in Protein Fitness Landscapes Using Reinforcement Learning in Latent Space

Authors

TL;DR

Abstract

Table of Contents

Figures (8)