PepGB: Facilitating peptide drug discovery via graph neural networks
Yipin Lei, Xu Wang, Meng Fang, Han Li, Xiang Li, Jianyang Zeng
TL;DR
PepGB addresses the bottlenecks of peptide drug discovery by predicting peptide-protein interactions on a heterogeneous graph, leveraging graph attention networks with a DropMessage perturbation and a dual-view loss to mitigate overfitting and data imbalance. A contrastive pre-training strategy enables robust peptide representations from a large unlabeled sequence corpus, improving generalization to novel targets and hits. To tackle imbalanced lead-generation data, diPepGB introduces directed edges encoding relative binding strength, enabling effective modeling in realistic assay conditions and supporting virtual alanine scanning. Across rigorous, cluster-based validations, PepGB shows superior performance over baselines in novel settings, while diPepGB demonstrates strong performance on imbalanced data and real-world lead optimization tasks, underscoring its potential to accelerate peptide early drug discovery.
Abstract
Peptides offer great biomedical potential and serve as promising drug candidates. Currently, the majority of approved peptide drugs are directly derived from well-explored natural human peptides. It is quite necessary to utilize advanced deep learning techniques to identify novel peptide drugs in the vast, unexplored biochemical space. Despite various in silico methods having been developed to accelerate peptide early drug discovery, existing models face challenges of overfitting and lacking generalizability due to the limited size, imbalanced distribution and inconsistent quality of experimental data. In this study, we propose PepGB, a deep learning framework to facilitate peptide early drug discovery by predicting peptide-protein interactions (PepPIs). Employing graph neural networks, PepGB incorporates a fine-grained perturbation module and a dual-view objective with contrastive learning-based peptide pre-trained representation to predict PepPIs. Through rigorous evaluations, we demonstrated that PepGB greatly outperforms baselines and can accurately identify PepPIs for novel targets and peptide hits, thereby contributing to the target identification and hit discovery processes. Next, we derive an extended version, diPepGB, to tackle the bottleneck of modeling highly imbalanced data prevalent in lead generation and optimization processes. Utilizing directed edges to represent relative binding strength between two peptide nodes, diPepGB achieves superior performance in real-world assays. In summary, our proposed frameworks can serve as potent tools to facilitate peptide early drug discovery.
