Table of Contents
Fetching ...

PepEVOLVE: Position-Aware Dynamic Peptide Optimization via Group-Relative Advantage

Trieu Nguyen, Hao-Wei Pang, Shasha Feng

TL;DR

PepEVOLVE tackles the challenge of multi-objective lead optimization in macrocyclic peptides by introducing dynamic pretraining with masking and CHUCKLES shifting, a context-free bandit router to automatically identify high-impact edit sites, and an evolving optimization loop that uses group-relative advantage for stable reinforcement learning. The framework demonstrates faster convergence and higher-quality outputs than PepINVENT on a therapeutically relevant Rev-binding macrocycle benchmark, achieving mean scores around $0.8$ and best candidates near $0.95$ versus PepINVENT’s ~${0.6}$ and ${0.87}$. By removing static assumptions about editable positions and enabling sequence evolution during optimization, PepEVOLVE provides a practical, reproducible route for MPO peptide design across multiple objectives. These advances offer a principled, data-driven approach to navigate macrocyclic peptide space more efficiently, with potential broad impact on lead optimization workflows in peptide therapeutics.

Abstract

Macrocyclic peptides are an emerging modality that combines biologics-like affinity with small-molecule-like developability, but their vast combinatorial space and multi-parameter objectives make lead optimization slow and challenging. Prior generative approaches such as PepINVENT require chemists to pre-specify mutable positions for optimization, choices that are not always known a priori, and rely on static pretraining and optimization algorithms that limit the model's ability to generalize and effectively optimize peptide sequences. We introduce PepEVOLVE, a position-aware, dynamic framework that learns both where to edit and how to dynamically optimize peptides for multi-objective improvement. PepEVOLVE (i) augments pretraining with dynamic masking and CHUCKLES shifting to improve generalization, (ii) uses a context-free multi-armed bandit router that discovers high-reward residues, and (iii) couples a novel evolving optimization algorithm with group-relative advantage to stabilize reinforcement updates. During in silico evaluations, the router policy reliably learns and concentrates probability on chemically meaningful sites that influence the peptide's properties. On a therapeutically motivated Rev-binding macrocycle benchmark, PepEVOLVE outperformed PepINVENT by reaching higher mean scores (approximately 0.8 vs. 0.6), achieving best candidates with a score of 0.95 (vs. 0.87), and converging in fewer steps under the task of optimizing permeability and lipophilicity with structural constraints. Overall, PepEVOLVE offers a practical, reproducible path to peptide lead optimization when optimal edit sites are unknown, enabling more efficient exploration and improving design quality across multiple objectives.

PepEVOLVE: Position-Aware Dynamic Peptide Optimization via Group-Relative Advantage

TL;DR

PepEVOLVE tackles the challenge of multi-objective lead optimization in macrocyclic peptides by introducing dynamic pretraining with masking and CHUCKLES shifting, a context-free bandit router to automatically identify high-impact edit sites, and an evolving optimization loop that uses group-relative advantage for stable reinforcement learning. The framework demonstrates faster convergence and higher-quality outputs than PepINVENT on a therapeutically relevant Rev-binding macrocycle benchmark, achieving mean scores around and best candidates near versus PepINVENT’s ~ and . By removing static assumptions about editable positions and enabling sequence evolution during optimization, PepEVOLVE provides a practical, reproducible route for MPO peptide design across multiple objectives. These advances offer a principled, data-driven approach to navigate macrocyclic peptide space more efficiently, with potential broad impact on lead optimization workflows in peptide therapeutics.

Abstract

Macrocyclic peptides are an emerging modality that combines biologics-like affinity with small-molecule-like developability, but their vast combinatorial space and multi-parameter objectives make lead optimization slow and challenging. Prior generative approaches such as PepINVENT require chemists to pre-specify mutable positions for optimization, choices that are not always known a priori, and rely on static pretraining and optimization algorithms that limit the model's ability to generalize and effectively optimize peptide sequences. We introduce PepEVOLVE, a position-aware, dynamic framework that learns both where to edit and how to dynamically optimize peptides for multi-objective improvement. PepEVOLVE (i) augments pretraining with dynamic masking and CHUCKLES shifting to improve generalization, (ii) uses a context-free multi-armed bandit router that discovers high-reward residues, and (iii) couples a novel evolving optimization algorithm with group-relative advantage to stabilize reinforcement updates. During in silico evaluations, the router policy reliably learns and concentrates probability on chemically meaningful sites that influence the peptide's properties. On a therapeutically motivated Rev-binding macrocycle benchmark, PepEVOLVE outperformed PepINVENT by reaching higher mean scores (approximately 0.8 vs. 0.6), achieving best candidates with a score of 0.95 (vs. 0.87), and converging in fewer steps under the task of optimizing permeability and lipophilicity with structural constraints. Overall, PepEVOLVE offers a practical, reproducible path to peptide lead optimization when optimal edit sites are unknown, enabling more efficient exploration and improving design quality across multiple objectives.

Paper Structure

This paper contains 29 sections, 19 equations, 11 figures.

Figures (11)

  • Figure 1: Sampling distribution for pretraining masks. Histogram of the number of masked residue positions $n_{\text{mask}}$ produced by PepEVOLVE's triangular sampling during pretraining. This ensures that the generative model is adept at generating single-residue mutations.
  • Figure 2: A toy example of CHUCKLES shifting. Two different CHUCKLES of the same cyclic peptides are shown. The second sequence is shifted one position to the right, illustrating rotational invariance for cyclic peptides.
  • Figure 3: Validation loss for three pretraining strategies: static masking, dynamic masking, and dynamic masking + dynamic CHUCKLES shifting. Left: evaluation on unshifted (standard) sequences. Right: evaluation on CHUCKLES-shifted sequences.
  • Figure 4: Evolving phase algorithm with group-relative advantage (GRA). Example of neighbor-masking using the input sequence $\texttt{"A|B|C|D"}$ with a set of optimized positions $O = \{0,2,3\}$. At each step, the top-$K$ peptides (here $K=2$) from the previous step are selected as seed inputs. For each seed, $|O|=3$ input contexts are extracted. In every context, $G=8$ candidate sequences are generated per seed, and their rewards are computed and normalized using the group-relative advantage.
  • Figure 5: Router policy convergence for minimizing the number of hydrogen-bond donors. (A) Router selection probabilities over routing steps when the objective is to minimize the total number of hydrogen-bond donors in a synthetic peptide. (B) Probability of the router identifying one optimal position. (C) Probability of the router identifying two optimal positions.
  • ...and 6 more figures