Effective Protein-Protein Interaction Exploration with PPIretrieval
Chenqing Hua, Connor Coley, Guy Wolf, Doina Precup, Shuangjia Zheng
TL;DR
PPIretrieval introduces a first-of-its-kind deep learning pipeline that learns rich surface-oriented embeddings for proteins and performs retrieval-based exploration of protein-protein interactions. By combining a Laplace-Beltrami heat-diffusion-based surface encoder with an interactive cross-attention decoder, coupled with lock-and-key, contrastive, and binding-interface objectives, the model retrieves likely binding partners and predicts their interfaces for unseen queries. Empirical results across multiple datasets show that increasing database size and search breadth improves docking-quality metrics (dockQ, TM) and interface prediction, with demonstrated cross-dataset generalization and practical case studies. This approach offers a scalable, end-to-end retrieval framework for rapid PPI exploration and potential discovery of novel protein interactions, with implications for drug discovery and protein engineering.
Abstract
Protein-protein interactions (PPIs) are crucial in regulating numerous cellular functions, including signal transduction, transportation, and immune defense. As the accuracy of multi-chain protein complex structure prediction improves, the challenge has shifted towards effectively navigating the vast complex universe to identify potential PPIs. Herein, we propose PPIretrieval, the first deep learning-based model for protein-protein interaction exploration, which leverages existing PPI data to effectively search for potential PPIs in an embedding space, capturing rich geometric and chemical information of protein surfaces. When provided with an unseen query protein with its associated binding site, PPIretrieval effectively identifies a potential binding partner along with its corresponding binding site in an embedding space, facilitating the formation of protein-protein complexes.
