Table of Contents
Fetching ...

SGAC: A Graph Neural Network Framework for Imbalanced and Structure-Aware AMP Classification

Yingxu Wang, Victor Liang, Nan Yin, Siwei Liu, Eran Segal

TL;DR

This work introduces SGAC, a Spatial GNN-based framework for antimicrobial peptide classification that leverages OmegaFold-predicted 3D structures to build compact Cα-based graphs and encode structural information with a GNN. It tackles severe class imbalance via Weight-enhanced Contrastive Learning and Weight-enhanced Pseudo-label Distillation, enabling balanced and discriminative learning from limited labeled data. Experiments on AMP and non-AMP datasets show state-of-the-art performance across peptide lengths, with ablation and sensitivity analyses confirming the contributions of each component. The approach holds practical potential for accelerating AMP discovery by providing accurate, structure-informed predictions while mitigating dataset imbalance; future work may incorporate additional physicochemical and motif features to further boost robustness and interpretability.

Abstract

Classifying Antimicrobial Peptides (AMPs) from the vast collection of peptides derived from metagenomic sequencing offers a promising avenue for combating antibiotic resistance. However, most existing AMP classification methods rely primarily on sequence-based representations and fail to capture the spatial structural information critical for accurate identification. Although recent graph-based approaches attempt to incorporate structural information, they typically construct residue- or atom-level graphs that introduce redundant atomic details and increase structural complexity. Furthermore, the class imbalance between the small number of known AMPs and the abundant non-AMPs significantly hinders predictive performance. To address these challenges, we employ lightweight OmegaFold to predict the three-dimensional structures of peptides and construct peptide graphs using C α atoms to capture their backbone geometry and spatial topology. Building on this representation, we propose the Spatial GNN-based AMP Classifier (SGAC), a novel framework that leverages Graph Neural Networks (GNNs) to extract structural features and generate discriminative graph representations. To handle class imbalance, SGAC incorporates Weight-enhanced Contrastive Learning to cluster structurally similar peptides and separate dissimilar ones through adaptive weighting, and applies Weight-enhanced Pseudo-label Distillation to generate high-confidence pseudo labels for unlabeled samples, achieving balanced and consistent representation learning. Experiments on publicly available AMP and non-AMP datasets demonstrate that SGAC significantly achieves state-of-the-art performance compared to baselines.

SGAC: A Graph Neural Network Framework for Imbalanced and Structure-Aware AMP Classification

TL;DR

This work introduces SGAC, a Spatial GNN-based framework for antimicrobial peptide classification that leverages OmegaFold-predicted 3D structures to build compact Cα-based graphs and encode structural information with a GNN. It tackles severe class imbalance via Weight-enhanced Contrastive Learning and Weight-enhanced Pseudo-label Distillation, enabling balanced and discriminative learning from limited labeled data. Experiments on AMP and non-AMP datasets show state-of-the-art performance across peptide lengths, with ablation and sensitivity analyses confirming the contributions of each component. The approach holds practical potential for accelerating AMP discovery by providing accurate, structure-informed predictions while mitigating dataset imbalance; future work may incorporate additional physicochemical and motif features to further boost robustness and interpretability.

Abstract

Classifying Antimicrobial Peptides (AMPs) from the vast collection of peptides derived from metagenomic sequencing offers a promising avenue for combating antibiotic resistance. However, most existing AMP classification methods rely primarily on sequence-based representations and fail to capture the spatial structural information critical for accurate identification. Although recent graph-based approaches attempt to incorporate structural information, they typically construct residue- or atom-level graphs that introduce redundant atomic details and increase structural complexity. Furthermore, the class imbalance between the small number of known AMPs and the abundant non-AMPs significantly hinders predictive performance. To address these challenges, we employ lightweight OmegaFold to predict the three-dimensional structures of peptides and construct peptide graphs using C α atoms to capture their backbone geometry and spatial topology. Building on this representation, we propose the Spatial GNN-based AMP Classifier (SGAC), a novel framework that leverages Graph Neural Networks (GNNs) to extract structural features and generate discriminative graph representations. To handle class imbalance, SGAC incorporates Weight-enhanced Contrastive Learning to cluster structurally similar peptides and separate dissimilar ones through adaptive weighting, and applies Weight-enhanced Pseudo-label Distillation to generate high-confidence pseudo labels for unlabeled samples, achieving balanced and consistent representation learning. Experiments on publicly available AMP and non-AMP datasets demonstrate that SGAC significantly achieves state-of-the-art performance compared to baselines.

Paper Structure

This paper contains 15 sections, 26 equations, 3 figures, 6 tables.

Figures (3)

  • Figure 1: Overall framework of our SGAC model. The framework consists of multiple stages designed to enhance AMP classification. Initially, Omegafold is employed to predict the three-dimensional (3D) structure of peptides based on their amino acid sequences, generating peptide graphs where nodes represent C$_\alpha$ atoms. These graphs are then processed by a Graph Neural Network (GNN) encoder to capture structural and relational features. Then, the embeddings produced by Graph Encoder are refined using three key components: Weight-enhanced Contrastive Learning, which improves feature separation between AMPs and non-AMPs; Weight-enhanced Pseudo-label Distillation, which dynamically refines predictions using high-confidence pseudo labels; and a Classification Loss, which ensures accurate supervised learning by directly optimizing the classification performance. Finally, our SGAC model produce the final prediction results.
  • Figure 2: AMP and non-AMP classification visualization using different models.
  • Figure 3: Hyper-parameter sensitivity analysis of $\lambda$, $\gamma$, Hidden dimension size $d$ and GNN layers $L$.