Table of Contents
Fetching ...

Effective Protein-Protein Interaction Exploration with PPIretrieval

Chenqing Hua, Connor Coley, Guy Wolf, Doina Precup, Shuangjia Zheng

TL;DR

PPIretrieval introduces a first-of-its-kind deep learning pipeline that learns rich surface-oriented embeddings for proteins and performs retrieval-based exploration of protein-protein interactions. By combining a Laplace-Beltrami heat-diffusion-based surface encoder with an interactive cross-attention decoder, coupled with lock-and-key, contrastive, and binding-interface objectives, the model retrieves likely binding partners and predicts their interfaces for unseen queries. Empirical results across multiple datasets show that increasing database size and search breadth improves docking-quality metrics (dockQ, TM) and interface prediction, with demonstrated cross-dataset generalization and practical case studies. This approach offers a scalable, end-to-end retrieval framework for rapid PPI exploration and potential discovery of novel protein interactions, with implications for drug discovery and protein engineering.

Abstract

Protein-protein interactions (PPIs) are crucial in regulating numerous cellular functions, including signal transduction, transportation, and immune defense. As the accuracy of multi-chain protein complex structure prediction improves, the challenge has shifted towards effectively navigating the vast complex universe to identify potential PPIs. Herein, we propose PPIretrieval, the first deep learning-based model for protein-protein interaction exploration, which leverages existing PPI data to effectively search for potential PPIs in an embedding space, capturing rich geometric and chemical information of protein surfaces. When provided with an unseen query protein with its associated binding site, PPIretrieval effectively identifies a potential binding partner along with its corresponding binding site in an embedding space, facilitating the formation of protein-protein complexes.

Effective Protein-Protein Interaction Exploration with PPIretrieval

TL;DR

PPIretrieval introduces a first-of-its-kind deep learning pipeline that learns rich surface-oriented embeddings for proteins and performs retrieval-based exploration of protein-protein interactions. By combining a Laplace-Beltrami heat-diffusion-based surface encoder with an interactive cross-attention decoder, coupled with lock-and-key, contrastive, and binding-interface objectives, the model retrieves likely binding partners and predicts their interfaces for unseen queries. Empirical results across multiple datasets show that increasing database size and search breadth improves docking-quality metrics (dockQ, TM) and interface prediction, with demonstrated cross-dataset generalization and practical case studies. This approach offers a scalable, end-to-end retrieval framework for rapid PPI exploration and potential discovery of novel protein interactions, with implications for drug discovery and protein engineering.

Abstract

Protein-protein interactions (PPIs) are crucial in regulating numerous cellular functions, including signal transduction, transportation, and immune defense. As the accuracy of multi-chain protein complex structure prediction improves, the challenge has shifted towards effectively navigating the vast complex universe to identify potential PPIs. Herein, we propose PPIretrieval, the first deep learning-based model for protein-protein interaction exploration, which leverages existing PPI data to effectively search for potential PPIs in an embedding space, capturing rich geometric and chemical information of protein surfaces. When provided with an unseen query protein with its associated binding site, PPIretrieval effectively identifies a potential binding partner along with its corresponding binding site in an embedding space, facilitating the formation of protein-protein complexes.
Paper Structure (23 sections, 13 equations, 10 figures, 14 tables)

This paper contains 23 sections, 13 equations, 10 figures, 14 tables.

Figures (10)

  • Figure 1: An overview of PPIretrieval pipeline, demonstrating the training and inference workflows. During training, PPIretrieval processes a PPI complex. The encoder network (shown in grey) encodes the two proteins, generating surface features, $\mathbf{H}_R,\mathbf{H}_L$. These features, along with information about their binding partnership, are stored in our database. The decoder network (shown in brown) then takes these surface features, along with the receptor's binding interface as input, predicting the ligand's binding interface and generating its embedding $\mathbf{F}_L$, and vice versa. During inference, PPIretrieval takes a protein $P$ with its corresponding binding interface to the encoder network. It encodes $P$ into a surface feature $\mathbf{H}_P$. Then, PPIretrieval identifies a surface feature $\mathbf{H}_B$ for binding partner $B$ in our database. The decoder network takes the surface features $\mathbf{H}_P, \mathbf{H}_B$, along with $P$'s binding interface, predicting $B$'s binding interface. A PPI complex is predicted between the input protein $P$ with the given binding interface and the predicted binding partner $B$ with the predicted binding interface, as demonstrated above. Details of PPIretrieval can be found in Sec. \ref{['sec:PPIsearch']}, with an overview discussed in Sec. \ref{['sec:ppisearch.pipeline.overview']}.
  • Figure 2: Visualization of PPIretrieval results for proteins in the PDB test set, evaluated by dockQ. Proteins colored in blue are input query proteins; proteins colored in red are binding partners. Left column displays the ground-truth structures; right column shows the structures predicted by PPIretrieval.
  • Figure 3: Evaluation of PPI binding site during inference. For a PPI in the test set, a query protein with a known binding site $\mathbf{Y}_\text{query}$ seeks a binding partner with an actual binding site $\mathbf{Y}_\text{true}$. However, we assume that the binding partner is unknown to us. So, PPIretrieval aims to retrieve a potential binding partner from the surface databse. PPIretrieval identifies protein $A$ in the surface database, which has the most similar surface representation to the query protein. Protein $A$ has a known binding partner $B$ with a reference binding site $\mathbf{Y}_B^{\text{ref}}$ (stored in database), a binding site $\mathbf{Y}_B^{\text{masif}}$ predicted by masif, and a binding site $\mathbf{Y}_B^{\text{dmasif}}$ predicted by dmasif. PPIretrieval takes query protein and $B$ as input and predicts a new binding site $\mathbf{Y}_B^{\text{pred}}$. We compute $\textit{dockQ}(\mathbf{Y}_\text{true}, \mathbf{Y}_B^{\text{pred}}), \textit{TM}(\mathbf{Y}_\text{true}, \mathbf{Y}_B^{\text{pred}}), \textit{rmsd}(\mathbf{Y}_\text{true}, \mathbf{Y}_B^{\text{pred}})$, $\textit{dockQ}(\mathbf{Y}_\text{true}, \mathbf{Y}_B^{\text{masif}}), \textit{TM}(\mathbf{Y}_\text{true}, \mathbf{Y}_B^{\text{masif}}), \textit{rmsd}(\mathbf{Y}_\text{true}, \mathbf{Y}_B^{\text{masif}})$, $\textit{dockQ}(\mathbf{Y}_\text{true}, \mathbf{Y}_B^{\text{dmasif}}), \textit{TM}(\mathbf{Y}_\text{true}, \mathbf{Y}_B^{\text{dmasif}}), \textit{rmsd}(\mathbf{Y}_\text{true}, \mathbf{Y}_B^{\text{dmasif}})$, $\textit{dockQ}(\mathbf{Y}_\text{true}, \mathbf{Y}_B^{\text{ref}}), \textit{TM}(\mathbf{Y}_\text{true}, \mathbf{Y}_B^{\text{ref}}), \textit{rmsd}(\mathbf{Y}_\text{true}, \mathbf{Y}_B^{\text{ref}})$ to evaluate and compare the quality of PPI and binding interfaces. $\mathbf{Y}_\text{true}$ denotes the known binding site of the ground-truth binding partner; $\mathbf{Y}^\text{ref}_B$ denotes the known binding site (stored in database) of the retrieved binding partner; $\mathbf{Y}^\text{pred}_B$ denotes the predicted binding site of the retrieved binding partner.
  • Figure 4: Case study using PPIretrieval. The query protein, highlighted in blue, successfully identifies a binding partner within our surface database using PPIsearch.
  • Figure 5: Visualization of PPIretrieval results for proteins in the PDB test set, evaluated by dockQ. Proteins colored in blue are input query proteins; proteins colored in red are binding partners. Left column displays the ground-truth structures; right column shows the structures predicted by PPIretrieval.
  • ...and 5 more figures