Table of Contents
Fetching ...

SCMPPI: Supervised Contrastive Multimodal Framework for Predicting Protein-Protein Interactions

Shengrui XU, Tianchi Lu, Zikun Wang, Jixiu Zhai

TL;DR

SCMPPI addresses PPI prediction by fusing sequence-derived features with network topology through a multimodal encoder and training with a supervised contrastive objective that includes negative-sample filtering. The method integrates AAC, DPC, and ESMC-CKSAAP sequence representations with Node2Vec graph embeddings, and optimizes a joint loss that combines binary classification with a filtered contrastive term, yielding state-of-the-art accuracy and AUC across eight benchmarks and strong cross-species generalization. Ablation studies confirm the complementary value of sequence, graph, and contrastive components, while network-focused tests demonstrate practical utility in CD9, Wnt, and cancer-related interaction networks. Collectively, SCMPPI advances multimodal biological data analysis by enhancing representation alignment, reducing false negatives, and supporting robust PPI prediction across diverse species and contexts.

Abstract

Protein-protein interaction (PPI) prediction plays a pivotal role in deciphering cellular functions and disease mechanisms. To address the limitations of traditional experimental methods and existing computational approaches in cross-modal feature fusion and false-negative suppression, we propose SCMPPI-a novel supervised contrastive multimodal framework. By effectively integrating sequence-based features (AAC, DPC, ESMC-CKSAAP) with network topology (Node2Vec embeddings) and incorporating an enhanced contrastive learning strategy with negative sample filtering, SCMPPI achieves superior prediction performance. Extensive experiments on eight benchmark datasets demonstrate its state-of-the-art accuracy(98.13%) and AUC(99.69%), along with excellent cross-species generalization (AUC>99%). Successful applications in CD9 networks, Wnt pathway analysis, and cancer-specific networks further highlight its potential for disease target discovery, establishing SCMPPI as a powerful tool for multimodal biological data analysis.

SCMPPI: Supervised Contrastive Multimodal Framework for Predicting Protein-Protein Interactions

TL;DR

SCMPPI addresses PPI prediction by fusing sequence-derived features with network topology through a multimodal encoder and training with a supervised contrastive objective that includes negative-sample filtering. The method integrates AAC, DPC, and ESMC-CKSAAP sequence representations with Node2Vec graph embeddings, and optimizes a joint loss that combines binary classification with a filtered contrastive term, yielding state-of-the-art accuracy and AUC across eight benchmarks and strong cross-species generalization. Ablation studies confirm the complementary value of sequence, graph, and contrastive components, while network-focused tests demonstrate practical utility in CD9, Wnt, and cancer-related interaction networks. Collectively, SCMPPI advances multimodal biological data analysis by enhancing representation alignment, reducing false negatives, and supporting robust PPI prediction across diverse species and contexts.

Abstract

Protein-protein interaction (PPI) prediction plays a pivotal role in deciphering cellular functions and disease mechanisms. To address the limitations of traditional experimental methods and existing computational approaches in cross-modal feature fusion and false-negative suppression, we propose SCMPPI-a novel supervised contrastive multimodal framework. By effectively integrating sequence-based features (AAC, DPC, ESMC-CKSAAP) with network topology (Node2Vec embeddings) and incorporating an enhanced contrastive learning strategy with negative sample filtering, SCMPPI achieves superior prediction performance. Extensive experiments on eight benchmark datasets demonstrate its state-of-the-art accuracy(98.13%) and AUC(99.69%), along with excellent cross-species generalization (AUC>99%). Successful applications in CD9 networks, Wnt pathway analysis, and cancer-specific networks further highlight its potential for disease target discovery, establishing SCMPPI as a powerful tool for multimodal biological data analysis.

Paper Structure

This paper contains 34 sections, 11 equations, 9 figures, 9 tables.

Figures (9)

  • Figure 1: The Architecture of SCMPPI.(a) SCMPPI includes a protein encoder, interaction classifier, and contrastive learning module.(b) Encoder has sequence encoder and Node2vec.(c) Contrastive module projects embeddings into shared space and calculates loss using cosine distance.(d) Sequence encoder uses AAC, DPC, and ESMC-CKSAAP for features.(e) Predicts interaction by outputting a score.
  • Figure 2: Comparison of results before and after contrastive learning
  • Figure 3: ESMC-CKSAAP module. (a) A protein sequence of length $L$ is passed through a pretrained protein model ESMC to obtain an embedding of size $L \times 960$. (b) The amino acid-level protein embeddings obtained are then mapped into CKSAAP according to a one-to-one correspondence rule, resulting in a new sequence embedding.
  • Figure 4: Negative pairs filtering mechanism
  • Figure 5: The impact of k on SCMPPI
  • ...and 4 more figures