Multi-Peptide: Multimodality Leveraged Language-Graph Learning of Peptide Properties
Srivathsan Badrinarayanan, Chakradhar Guntuboina, Parisa Mollaei, Amir Barati Farimani
TL;DR
Multi-Peptide addresses the challenge of predicting peptide properties by fusing sequence-based representations from PeptideBERT with structure-aware embeddings from a Graph Neural Network trained on AlphaFold-derived PDB graphs. A CLIP-style loss aligns the two modalities into a shared latent space, enabling joint learning that leverages both amino-acid sequence context and three-dimensional structural information. The approach achieves state-of-the-art performance on hemolysis prediction ($86.185\%$) and demonstrates robust multimodal behavior, though it shows task-dependent gains with nonfouling data where a fine-tuned text model still outperforms the ensemble. Overall, the work highlights the promise of multimodal learning in bioinformatics for more accurate and holistic peptide property predictions, with open resources for reproducibility and future method refinements.
Abstract
Peptides are essential in biological processes and therapeutics. In this study, we introduce Multi-Peptide, an innovative approach that combines transformer-based language models with Graph Neural Networks (GNNs) to predict peptide properties. We combine PeptideBERT, a transformer model tailored for peptide property prediction, with a GNN encoder to capture both sequence-based and structural features. By employing Contrastive Language-Image Pre-training (CLIP), Multi-Peptide aligns embeddings from both modalities into a shared latent space, thereby enhancing the model's predictive accuracy. Evaluations on hemolysis and nonfouling datasets demonstrate Multi-Peptide's robustness, achieving state-of-the-art 86.185% accuracy in hemolysis prediction. This study highlights the potential of multimodal learning in bioinformatics, paving the way for accurate and reliable predictions in peptide-based research and applications.
