SCMPPI: Supervised Contrastive Multimodal Framework for Predicting Protein-Protein Interactions
Shengrui XU, Tianchi Lu, Zikun Wang, Jixiu Zhai
TL;DR
SCMPPI addresses PPI prediction by fusing sequence-derived features with network topology through a multimodal encoder and training with a supervised contrastive objective that includes negative-sample filtering. The method integrates AAC, DPC, and ESMC-CKSAAP sequence representations with Node2Vec graph embeddings, and optimizes a joint loss that combines binary classification with a filtered contrastive term, yielding state-of-the-art accuracy and AUC across eight benchmarks and strong cross-species generalization. Ablation studies confirm the complementary value of sequence, graph, and contrastive components, while network-focused tests demonstrate practical utility in CD9, Wnt, and cancer-related interaction networks. Collectively, SCMPPI advances multimodal biological data analysis by enhancing representation alignment, reducing false negatives, and supporting robust PPI prediction across diverse species and contexts.
Abstract
Protein-protein interaction (PPI) prediction plays a pivotal role in deciphering cellular functions and disease mechanisms. To address the limitations of traditional experimental methods and existing computational approaches in cross-modal feature fusion and false-negative suppression, we propose SCMPPI-a novel supervised contrastive multimodal framework. By effectively integrating sequence-based features (AAC, DPC, ESMC-CKSAAP) with network topology (Node2Vec embeddings) and incorporating an enhanced contrastive learning strategy with negative sample filtering, SCMPPI achieves superior prediction performance. Extensive experiments on eight benchmark datasets demonstrate its state-of-the-art accuracy(98.13%) and AUC(99.69%), along with excellent cross-species generalization (AUC>99%). Successful applications in CD9 networks, Wnt pathway analysis, and cancer-specific networks further highlight its potential for disease target discovery, establishing SCMPPI as a powerful tool for multimodal biological data analysis.
