NaNa and MiGu: Semantic Data Augmentation Techniques to Enhance Protein Classification in Graph Neural Networks

Yi-Shan Lan; Pin-Yu Chen; Tsung-Yi Ho

NaNa and MiGu: Semantic Data Augmentation Techniques to Enhance Protein Classification in Graph Neural Networks

Yi-Shan Lan, Pin-Yu Chen, Tsung-Yi Ho

TL;DR

This paper proposes novel semantic data augmentation methods, Novel Augmentation of New Node Attributes (NaNa), and Molecular Interactions and Geometric Upgrading (MiGu) to incorporate backbone chemical and side-chain biophysical information into protein classification tasks and a co-embedding residual learning framework.

Abstract

Protein classification tasks are essential in drug discovery. Real-world protein structures are dynamic, which will determine the properties of proteins. However, the existing machine learning methods, like ProNet (Wang et al., 2022a), only access limited conformational characteristics and protein side-chain features, leading to impractical protein structure and inaccuracy of protein classes in their predictions. In this paper, we propose novel semantic data augmentation methods, Novel Augmentation of New Node Attributes (NaNa), and Molecular Interactions and Geometric Upgrading (MiGu) to incorporate backbone chemical and side-chain biophysical information into protein classification tasks and a co-embedding residual learning framework. Specifically, we leverage molecular biophysical, secondary structure, chemical bonds, and ionic features of proteins to facilitate protein classification tasks. Furthermore, our semantic augmentation methods and the co-embedding residual learning framework can improve the performance of GIN (Xu et al., 2019) on EC and Fold datasets (Bairoch, 2000; Andreeva et al., 2007) by 16.41% and 11.33% respectively. Our code is available at https://github.com/r08b46009/Code_for_MIGU_NANA/tree/main.

NaNa and MiGu: Semantic Data Augmentation Techniques to Enhance Protein Classification in Graph Neural Networks

TL;DR

Abstract

Paper Structure (23 sections, 1 equation, 1 figure, 3 tables)

This paper contains 23 sections, 1 equation, 1 figure, 3 tables.

Introduction
Related Work
Protein Structure Representation
Harnessing Graph Neural Networks for Protein Structure Classification
Relationship between Chemical information and Protein Classification
Methods
Procedure Overview
Data Augmentation
Novel Node Attributes
Novel Edge Attributes
Our Method: MiGu & NaNa Data Augmentation
Co-Embedding Residual Learning Framework
Experimental Design
Implementation Details
Datasets
...and 8 more sections

Figures (1)

Figure 2: This figure illustrates the difference in convergence speed between with and without residual learning framework on the EC dataset with the GIN model and NaNa semantic protein structure augmentation. The X-axis is the number of training epochs, and the Y-axis is the training loss. We can see that the convergence time of the model with residual learning framework can surpass the Vallina model without residual framework by 1.76 times.

NaNa and MiGu: Semantic Data Augmentation Techniques to Enhance Protein Classification in Graph Neural Networks

TL;DR

Abstract

NaNa and MiGu: Semantic Data Augmentation Techniques to Enhance Protein Classification in Graph Neural Networks

Authors

TL;DR

Abstract

Table of Contents

Figures (1)