SGAC: A Graph Neural Network Framework for Imbalanced and Structure-Aware AMP Classification
Yingxu Wang, Victor Liang, Nan Yin, Siwei Liu, Eran Segal
TL;DR
This work introduces SGAC, a Spatial GNN-based framework for antimicrobial peptide classification that leverages OmegaFold-predicted 3D structures to build compact Cα-based graphs and encode structural information with a GNN. It tackles severe class imbalance via Weight-enhanced Contrastive Learning and Weight-enhanced Pseudo-label Distillation, enabling balanced and discriminative learning from limited labeled data. Experiments on AMP and non-AMP datasets show state-of-the-art performance across peptide lengths, with ablation and sensitivity analyses confirming the contributions of each component. The approach holds practical potential for accelerating AMP discovery by providing accurate, structure-informed predictions while mitigating dataset imbalance; future work may incorporate additional physicochemical and motif features to further boost robustness and interpretability.
Abstract
Classifying Antimicrobial Peptides (AMPs) from the vast collection of peptides derived from metagenomic sequencing offers a promising avenue for combating antibiotic resistance. However, most existing AMP classification methods rely primarily on sequence-based representations and fail to capture the spatial structural information critical for accurate identification. Although recent graph-based approaches attempt to incorporate structural information, they typically construct residue- or atom-level graphs that introduce redundant atomic details and increase structural complexity. Furthermore, the class imbalance between the small number of known AMPs and the abundant non-AMPs significantly hinders predictive performance. To address these challenges, we employ lightweight OmegaFold to predict the three-dimensional structures of peptides and construct peptide graphs using C α atoms to capture their backbone geometry and spatial topology. Building on this representation, we propose the Spatial GNN-based AMP Classifier (SGAC), a novel framework that leverages Graph Neural Networks (GNNs) to extract structural features and generate discriminative graph representations. To handle class imbalance, SGAC incorporates Weight-enhanced Contrastive Learning to cluster structurally similar peptides and separate dissimilar ones through adaptive weighting, and applies Weight-enhanced Pseudo-label Distillation to generate high-confidence pseudo labels for unlabeled samples, achieving balanced and consistent representation learning. Experiments on publicly available AMP and non-AMP datasets demonstrate that SGAC significantly achieves state-of-the-art performance compared to baselines.
