Table of Contents
Fetching ...

KEMP-PIP: A Feature-Fusion Based Approach for Pro-inflammatory Peptide Prediction

Soumik Deb Niloy, Md. Fahmid-Ul-Alam Juboraj, Swakkhar Shatabda

TL;DR

KEMP-PIP is presented, a hybrid machine learning framework that integrates deep protein embeddings with handcrafted descriptors for robust PIP prediction that outperforms ProIn-fuse, MultiFeatVotPIP, and StackPIP.

Abstract

Pro-inflammatory peptides (PIPs) play critical roles in immune signaling and inflammation but are difficult to identify experimentally due to costly and time-consuming assays. To address this challenge, we present KEMP-PIP, a hybrid machine learning framework that integrates deep protein embeddings with handcrafted descriptors for robust PIP prediction. Our approach combines contextual embeddings from pretrained ESM protein language models with multi-scale k-mer frequencies, physicochemical descriptors, and modlAMP sequence features. Feature pruning and class-weighted logistic regression manage high dimensionality and class imbalance, while ensemble averaging with an optimized decision threshold enhances the sensitivity--specificity balance. Through systematic ablation studies, we demonstrate that integrating complementary feature sets consistently improves predictive performance. On the standard benchmark dataset, KEMP-PIP achieves an MCC of 0.505, accuracy of 0.752, and AUC of 0.762, outperforming ProIn-fuse, MultiFeatVotPIP, and StackPIP. Relative to StackPIP, these results represent improvements of 9.5% in MCC and 4.8% in both accuracy and AUC. The KEMP-PIP web server is freely available at https://nilsparrow1920-kemp-pip.hf.space/ and the full implementation at https://github.com/S18-Niloy/KEMP-PIP.

KEMP-PIP: A Feature-Fusion Based Approach for Pro-inflammatory Peptide Prediction

TL;DR

KEMP-PIP is presented, a hybrid machine learning framework that integrates deep protein embeddings with handcrafted descriptors for robust PIP prediction that outperforms ProIn-fuse, MultiFeatVotPIP, and StackPIP.

Abstract

Pro-inflammatory peptides (PIPs) play critical roles in immune signaling and inflammation but are difficult to identify experimentally due to costly and time-consuming assays. To address this challenge, we present KEMP-PIP, a hybrid machine learning framework that integrates deep protein embeddings with handcrafted descriptors for robust PIP prediction. Our approach combines contextual embeddings from pretrained ESM protein language models with multi-scale k-mer frequencies, physicochemical descriptors, and modlAMP sequence features. Feature pruning and class-weighted logistic regression manage high dimensionality and class imbalance, while ensemble averaging with an optimized decision threshold enhances the sensitivity--specificity balance. Through systematic ablation studies, we demonstrate that integrating complementary feature sets consistently improves predictive performance. On the standard benchmark dataset, KEMP-PIP achieves an MCC of 0.505, accuracy of 0.752, and AUC of 0.762, outperforming ProIn-fuse, MultiFeatVotPIP, and StackPIP. Relative to StackPIP, these results represent improvements of 9.5% in MCC and 4.8% in both accuracy and AUC. The KEMP-PIP web server is freely available at https://nilsparrow1920-kemp-pip.hf.space/ and the full implementation at https://github.com/S18-Niloy/KEMP-PIP.
Paper Structure (25 sections, 6 equations, 4 figures, 6 tables)

This paper contains 25 sections, 6 equations, 4 figures, 6 tables.

Figures (4)

  • Figure 1: Architecture of the KEMP-PIP framework. Four feature streams are fused into two hybrid models whose predictions are ensemble-averaged with a tuned threshold.
  • Figure 2: ROC curves for ensemble combinations (a)--(j).
  • Figure 3: Web interface of the KEMP-PIP framework, supporting FASTA file upload, CSV upload, and manual sequence entry.
  • Figure :