Table of Contents
Fetching ...

Graph-level Protein Representation Learning by Structure Knowledge Refinement

Ge Wang, Zelin Zang, Jiangbin Zheng, Jun Xia, Stan Z. Li

TL;DR

This paper proposes a novel framework called Structure Knowledge Refinement (SKR) which uses data structure to determine the probability of whether a pair is positive or negative, and proposes an augmentation strategy that naturally preserves the semantic meaning of the original data and is compatible with the SKR framework.

Abstract

This paper focuses on learning representation on the whole graph level in an unsupervised manner. Learning graph-level representation plays an important role in a variety of real-world issues such as molecule property prediction, protein structure feature extraction, and social network analysis. The mainstream method is utilizing contrastive learning to facilitate graph feature extraction, known as Graph Contrastive Learning (GCL). GCL, although effective, suffers from some complications in contrastive learning, such as the effect of false negative pairs. Moreover, augmentation strategies in GCL are weakly adaptive to diverse graph datasets. Motivated by these problems, we propose a novel framework called Structure Knowledge Refinement (SKR) which uses data structure to determine the probability of whether a pair is positive or negative. Meanwhile, we propose an augmentation strategy that naturally preserves the semantic meaning of the original data and is compatible with our SKR framework. Furthermore, we illustrate the effectiveness of our SKR framework through intuition and experiments. The experimental results on the tasks of graph-level classification demonstrate that our SKR framework is superior to most state-of-the-art baselines.

Graph-level Protein Representation Learning by Structure Knowledge Refinement

TL;DR

This paper proposes a novel framework called Structure Knowledge Refinement (SKR) which uses data structure to determine the probability of whether a pair is positive or negative, and proposes an augmentation strategy that naturally preserves the semantic meaning of the original data and is compatible with the SKR framework.

Abstract

This paper focuses on learning representation on the whole graph level in an unsupervised manner. Learning graph-level representation plays an important role in a variety of real-world issues such as molecule property prediction, protein structure feature extraction, and social network analysis. The mainstream method is utilizing contrastive learning to facilitate graph feature extraction, known as Graph Contrastive Learning (GCL). GCL, although effective, suffers from some complications in contrastive learning, such as the effect of false negative pairs. Moreover, augmentation strategies in GCL are weakly adaptive to diverse graph datasets. Motivated by these problems, we propose a novel framework called Structure Knowledge Refinement (SKR) which uses data structure to determine the probability of whether a pair is positive or negative. Meanwhile, we propose an augmentation strategy that naturally preserves the semantic meaning of the original data and is compatible with our SKR framework. Furthermore, we illustrate the effectiveness of our SKR framework through intuition and experiments. The experimental results on the tasks of graph-level classification demonstrate that our SKR framework is superior to most state-of-the-art baselines.
Paper Structure (12 sections, 6 equations, 5 figures, 3 tables, 1 algorithm)

This paper contains 12 sections, 6 equations, 5 figures, 3 tables, 1 algorithm.

Figures (5)

  • Figure 1: The framework of Structure Knowledge Refinement (SKR). Graph-level representations in semantic space are derived from graph data in original space by Graph Isomorphism Network (GIN), and augmented graph-level representations are generated by our semantic preserving augmentation strategy. Then semantic-space structure knowledge is obtained by structure knowledge extractor, and fuzzy cross-entropy is used to refine data structure in embedding space to derive better representations by passing semantic-space structure knowledge into embedding space.
  • Figure 2: Comparison of SKR and GCL.
  • Figure 3: Sensitivity analysis on hyperparameter $\alpha$ in Dirichlet distribution on different datasets
  • Figure 4: Ablation study on Dirichlet pooling on different datasets (with Dirichlet pooling vs without Dirichlet pooling)
  • Figure 5: Ablation study on fuzzy cross-entropy on REDDIT-B dataset (Fuzzy cross-entropy vs Normal cross-entropy)