PepEVOLVE: Position-Aware Dynamic Peptide Optimization via Group-Relative Advantage
Trieu Nguyen, Hao-Wei Pang, Shasha Feng
TL;DR
PepEVOLVE tackles the challenge of multi-objective lead optimization in macrocyclic peptides by introducing dynamic pretraining with masking and CHUCKLES shifting, a context-free bandit router to automatically identify high-impact edit sites, and an evolving optimization loop that uses group-relative advantage for stable reinforcement learning. The framework demonstrates faster convergence and higher-quality outputs than PepINVENT on a therapeutically relevant Rev-binding macrocycle benchmark, achieving mean scores around $0.8$ and best candidates near $0.95$ versus PepINVENT’s ~${0.6}$ and ${0.87}$. By removing static assumptions about editable positions and enabling sequence evolution during optimization, PepEVOLVE provides a practical, reproducible route for MPO peptide design across multiple objectives. These advances offer a principled, data-driven approach to navigate macrocyclic peptide space more efficiently, with potential broad impact on lead optimization workflows in peptide therapeutics.
Abstract
Macrocyclic peptides are an emerging modality that combines biologics-like affinity with small-molecule-like developability, but their vast combinatorial space and multi-parameter objectives make lead optimization slow and challenging. Prior generative approaches such as PepINVENT require chemists to pre-specify mutable positions for optimization, choices that are not always known a priori, and rely on static pretraining and optimization algorithms that limit the model's ability to generalize and effectively optimize peptide sequences. We introduce PepEVOLVE, a position-aware, dynamic framework that learns both where to edit and how to dynamically optimize peptides for multi-objective improvement. PepEVOLVE (i) augments pretraining with dynamic masking and CHUCKLES shifting to improve generalization, (ii) uses a context-free multi-armed bandit router that discovers high-reward residues, and (iii) couples a novel evolving optimization algorithm with group-relative advantage to stabilize reinforcement updates. During in silico evaluations, the router policy reliably learns and concentrates probability on chemically meaningful sites that influence the peptide's properties. On a therapeutically motivated Rev-binding macrocycle benchmark, PepEVOLVE outperformed PepINVENT by reaching higher mean scores (approximately 0.8 vs. 0.6), achieving best candidates with a score of 0.95 (vs. 0.87), and converging in fewer steps under the task of optimizing permeability and lipophilicity with structural constraints. Overall, PepEVOLVE offers a practical, reproducible path to peptide lead optimization when optimal edit sites are unknown, enabling more efficient exploration and improving design quality across multiple objectives.
