GeoPep: A geometry-aware masked language model for protein-peptide binding site prediction
Dian Chen, Yunkai Chen, Tong Lin, Sijie Chen, Xiaolin Cheng
TL;DR
GeoPep addresses the challenge of predicting peptide-binding sites on proteins amid peptide flexibility and limited structural data by transferring knowledge from the multimodal ESM3 foundation model and enhancing it with parameter-efficient Kolmogorov-Arnold Networks alongside distance-based geometric losses. The method leverages ESM3’s integrated sequence–structure representations and enforces spatial coherence through a geometry-aware objective, achieving state-of-the-art performance on peptide–protein benchmarks and superior geometric localization of interfaces. Structural evaluations and comparisons to existing methods demonstrate GeoPep’s robustness to induced-fit interfaces and its ability to generalize beyond pre-formed pockets, suggesting significant potential for peptide therapeutics design and integration into drug discovery pipelines. The work highlights the value of combining foundation-model transfer learning with geometry-aware regularization for specialized molecular interaction tasks, while acknowledging data limitations and suggesting avenues for dataset expansion and affinity-oriented extensions.
Abstract
Multimodal approaches that integrate protein structure and sequence have achieved remarkable success in protein-protein interface prediction. However, extending these methods to protein-peptide interactions remains challenging due to the inherent conformational flexibility of peptides and the limited availability of structural data that hinder direct training of structure-aware models. To address these limitations, we introduce GeoPep, a novel framework for peptide binding site prediction that leverages transfer learning from ESM3, a multimodal protein foundation model. GeoPep fine-tunes ESM3's rich pre-learned representations from protein-protein binding to address the limited availability of protein-peptide binding data. The fine-tuned model is further integrated with a parameter-efficient neural network architecture capable of learning complex patterns from sparse data. Furthermore, the model is trained using distance-based loss functions that exploit 3D structural information to enhance binding site prediction. Comprehensive evaluations demonstrate that GeoPep significantly outperforms existing methods in protein-peptide binding site prediction by effectively capturing sparse and heterogeneous binding patterns.
