Table of Contents
Fetching ...

Variational Graph Auto-Encoder Based Inductive Learning Method for Semi-Supervised Classification

Hanxuan Yang, Zhaoxin Yu, Qingchao Kong, Wei Liu, Wenji Mao

TL;DR

This work proposes the Self-Label Augmented VGAE model, which takes node labels as one-hot encoded inputs and then performs label reconstruction in model training and proposes the Self-Label Augmentation Method (SLAM), which uses pseudo labels generated by the model with a node-wise masking approach to enhance the label information.

Abstract

Graph representation learning is a fundamental research issue in various domains of applications, of which the inductive learning problem is particularly challenging as it requires models to generalize to unseen graph structures during inference. In recent years, graph neural networks (GNNs) have emerged as powerful graph models for inductive learning tasks such as node classification, whereas they typically heavily rely on the annotated nodes under a fully supervised training setting. Compared with the GNN-based methods, variational graph auto-encoders (VGAEs) are known to be more generalizable to capture the internal structural information of graphs independent of node labels and have achieved prominent performance on multiple unsupervised learning tasks. However, so far there is still a lack of work focusing on leveraging the VGAE framework for inductive learning, due to the difficulties in training the model in a supervised manner and avoiding over-fitting the proximity information of graphs. To solve these problems and improve the model performance of VGAEs for inductive graph representation learning, in this work, we propose the Self-Label Augmented VGAE model. To leverage the label information for training, our model takes node labels as one-hot encoded inputs and then performs label reconstruction in model training. To overcome the scarcity problem of node labels for semi-supervised settings, we further propose the Self-Label Augmentation Method (SLAM), which uses pseudo labels generated by our model with a node-wise masking approach to enhance the label information. Experiments on benchmark inductive learning graph datasets verify that our proposed model archives promising results on node classification with particular superiority under semi-supervised learning settings.

Variational Graph Auto-Encoder Based Inductive Learning Method for Semi-Supervised Classification

TL;DR

This work proposes the Self-Label Augmented VGAE model, which takes node labels as one-hot encoded inputs and then performs label reconstruction in model training and proposes the Self-Label Augmentation Method (SLAM), which uses pseudo labels generated by the model with a node-wise masking approach to enhance the label information.

Abstract

Graph representation learning is a fundamental research issue in various domains of applications, of which the inductive learning problem is particularly challenging as it requires models to generalize to unseen graph structures during inference. In recent years, graph neural networks (GNNs) have emerged as powerful graph models for inductive learning tasks such as node classification, whereas they typically heavily rely on the annotated nodes under a fully supervised training setting. Compared with the GNN-based methods, variational graph auto-encoders (VGAEs) are known to be more generalizable to capture the internal structural information of graphs independent of node labels and have achieved prominent performance on multiple unsupervised learning tasks. However, so far there is still a lack of work focusing on leveraging the VGAE framework for inductive learning, due to the difficulties in training the model in a supervised manner and avoiding over-fitting the proximity information of graphs. To solve these problems and improve the model performance of VGAEs for inductive graph representation learning, in this work, we propose the Self-Label Augmented VGAE model. To leverage the label information for training, our model takes node labels as one-hot encoded inputs and then performs label reconstruction in model training. To overcome the scarcity problem of node labels for semi-supervised settings, we further propose the Self-Label Augmentation Method (SLAM), which uses pseudo labels generated by our model with a node-wise masking approach to enhance the label information. Experiments on benchmark inductive learning graph datasets verify that our proposed model archives promising results on node classification with particular superiority under semi-supervised learning settings.
Paper Structure (16 sections, 9 equations, 4 figures, 4 tables, 2 algorithms)

This paper contains 16 sections, 9 equations, 4 figures, 4 tables, 2 algorithms.

Figures (4)

  • Figure 1: The sketch of our proposed SLA-VGAE model. During training, the nodes for testing and validation are unseen in the input graph. The true node labels are augmented via SLAM (after the warm-up stage) and then combined with node features as input of the GCN encoder to generate node representations. The decoder reconstructs the augmented node labels and features and calculates the loss function for model training (blue dashed arrows).
  • Figure 2: An illustration of the proposed SLAM for label augmentation. The input graph is randomly masked with some nodes and fed into the model $\mathcal{M}$ obtained from the last iteration of training to generate labels of the unmasked nodes. The final augmented labels $\tilde{\hbox{\bf{Y}}}$ are computed by averaging over all generated labels and then filtering the low-confident samples, where the ground-truth labels of the labeled nodes are retained as well.
  • Figure 3: Experimental results of node classification accuracy on the inductive learning datasets with different labeling rates.
  • Figure 4: Sensitivity analysis results of node classification accuracy for the generation times $K$, unmasking probability $p$, and confidential threshold $\theta$ on the Flickr (a-c) and Reddit (d-f) datasets. Different colors indicate the labeling rates of each dataset, and shading indicates the 95% confidence interval based on 3 independent runs.