Table of Contents
Fetching ...

Joint Masked Reconstruction and Contrastive Learning for Mining Interactions Between Proteins

Jiang Li, Xiaoping Wang

TL;DR

This work tackles PPI prediction by addressing both the extraction of intrinsic structural cues from protein residues and the mining of extrinsic interaction cues among proteins. It introduces JmcPPI, a two-stage framework that first encodes residue structure via a HetGNN-GAT architecture with masked reconstruction, then infers protein interactions on a PPI graph using a GIN encoder augmented with multi-graph contrastive learning on perturbed graphs. The approach yields strong, generalizable performance across three large benchmark datasets and multiple data partitions, supported by comprehensive ablations and qualitative analyses. The results suggest substantial practical impact for structure-informed PPI prediction and open avenues for extending the framework to related bioinformatics tasks and multi-modal data integration.

Abstract

Protein-protein interaction (PPI) prediction is an instrumental means in elucidating the mechanisms underlying cellular operations, holding significant practical implications for the realms of pharmaceutical development and clinical treatment. Presently, the majority of research methods primarily concentrate on the analysis of amino acid sequences, while investigations predicated on protein structures remain in the nascent stages of exploration. Despite the emergence of several structure-based algorithms in recent years, these are still confronted with inherent challenges: (1) the extraction of intrinsic structural information of proteins typically necessitates the expenditure of substantial computational resources; (2) these models are overly reliant on seen protein data, struggling to effectively unearth interaction cues between unknown proteins. To further propel advancements in this domain, this paper introduces a novel PPI prediction method jointing masked reconstruction and contrastive learning, termed JmcPPI. This methodology dissects the PPI prediction task into two distinct phases: during the residue structure encoding phase, JmcPPI devises two feature reconstruction tasks and employs graph attention mechanism to capture structural information between residues; during the protein interaction inference phase, JmcPPI perturbs the original PPI graph and employs a multi-graph contrastive learning strategy to thoroughly mine extrinsic interaction information of novel proteins. Extensive experiments conducted on three widely utilized PPI datasets demonstrate that JmcPPI surpasses existing optimal baseline models across various data partition schemes. The associated code can be accessed via https://github.com/lijfrank-open/JmcPPI.

Joint Masked Reconstruction and Contrastive Learning for Mining Interactions Between Proteins

TL;DR

This work tackles PPI prediction by addressing both the extraction of intrinsic structural cues from protein residues and the mining of extrinsic interaction cues among proteins. It introduces JmcPPI, a two-stage framework that first encodes residue structure via a HetGNN-GAT architecture with masked reconstruction, then infers protein interactions on a PPI graph using a GIN encoder augmented with multi-graph contrastive learning on perturbed graphs. The approach yields strong, generalizable performance across three large benchmark datasets and multiple data partitions, supported by comprehensive ablations and qualitative analyses. The results suggest substantial practical impact for structure-informed PPI prediction and open avenues for extending the framework to related bioinformatics tasks and multi-modal data integration.

Abstract

Protein-protein interaction (PPI) prediction is an instrumental means in elucidating the mechanisms underlying cellular operations, holding significant practical implications for the realms of pharmaceutical development and clinical treatment. Presently, the majority of research methods primarily concentrate on the analysis of amino acid sequences, while investigations predicated on protein structures remain in the nascent stages of exploration. Despite the emergence of several structure-based algorithms in recent years, these are still confronted with inherent challenges: (1) the extraction of intrinsic structural information of proteins typically necessitates the expenditure of substantial computational resources; (2) these models are overly reliant on seen protein data, struggling to effectively unearth interaction cues between unknown proteins. To further propel advancements in this domain, this paper introduces a novel PPI prediction method jointing masked reconstruction and contrastive learning, termed JmcPPI. This methodology dissects the PPI prediction task into two distinct phases: during the residue structure encoding phase, JmcPPI devises two feature reconstruction tasks and employs graph attention mechanism to capture structural information between residues; during the protein interaction inference phase, JmcPPI perturbs the original PPI graph and employs a multi-graph contrastive learning strategy to thoroughly mine extrinsic interaction information of novel proteins. Extensive experiments conducted on three widely utilized PPI datasets demonstrate that JmcPPI surpasses existing optimal baseline models across various data partition schemes. The associated code can be accessed via https://github.com/lijfrank-open/JmcPPI.

Paper Structure

This paper contains 21 sections, 26 equations, 9 figures, 5 tables.

Figures (9)

  • Figure 1: Overall workflow of the proposed JmcPPI.
  • Figure 2: Precision-recall (PR) curves of JmcPPI and MAPE-PPI on benchmark datasets.
  • Figure 3: F1 scores for each PPI in benchmark datasets.
  • Figure 4: Comparison of JmcPPI and MAPE-PPI on different subsets. Dashed boxe indicates the percentage of the corresponding subset in the whole dataset.
  • Figure 5: Performance changes for removal contrastive learning tasks. Here, w/o $\mathcal{L}_{\text{CON}}^\alpha$/$\mathcal{L}_{\text{CON}}^\beta$ refers to the ablation of single-view contrastive learning, and w/o $\mathcal{L}_{\text{CON}}$ signifies the removal of the entire multi-view contrastive learning.
  • ...and 4 more figures