GraphMAE: Self-Supervised Masked Graph Autoencoders

Zhenyu Hou; Xiao Liu; Yukuo Cen; Yuxiao Dong; Hongxia Yang; Chunjie Wang; Jie Tang

GraphMAE: Self-Supervised Masked Graph Autoencoders

Zhenyu Hou, Xiao Liu, Yukuo Cen, Yuxiao Dong, Hongxia Yang, Chunjie Wang, Jie Tang

TL;DR

GraphMAE tackles the slow progress of generative self-supervised learning on graphs by reframing GAEs to reconstruct masked node features rather than graph structure. It introduces masked feature reconstruction with a re-mask decoding strategy and a scaled cosine error to improve robustness and learning signal. Across 21 public datasets and three task families (node classification, graph classification, transfer learning), GraphMAE consistently outperforms state-of-the-art contrastive and generative baselines, and often matches supervised results. The work provides design principles for robust generative SSL on graphs and demonstrates the potential of masked autoencoding in graph representation learning.

Abstract

Self-supervised learning (SSL) has been extensively explored in recent years. Particularly, generative SSL has seen emerging success in natural language processing and other AI fields, such as the wide adoption of BERT and GPT. Despite this, contrastive learning-which heavily relies on structural data augmentation and complicated training strategies-has been the dominant approach in graph SSL, while the progress of generative SSL on graphs, especially graph autoencoders (GAEs), has thus far not reached the potential as promised in other fields. In this paper, we identify and examine the issues that negatively impact the development of GAEs, including their reconstruction objective, training robustness, and error metric. We present a masked graph autoencoder GraphMAE that mitigates these issues for generative self-supervised graph pretraining. Instead of reconstructing graph structures, we propose to focus on feature reconstruction with both a masking strategy and scaled cosine error that benefit the robust training of GraphMAE. We conduct extensive experiments on 21 public datasets for three different graph learning tasks. The results manifest that GraphMAE-a simple graph autoencoder with careful designs-can consistently generate outperformance over both contrastive and generative state-of-the-art baselines. This study provides an understanding of graph autoencoders and demonstrates the potential of generative self-supervised pre-training on graphs.

GraphMAE: Self-Supervised Masked Graph Autoencoders

TL;DR

Abstract

Paper Structure (21 sections, 4 equations, 5 figures, 9 tables)

This paper contains 21 sections, 4 equations, 5 figures, 9 tables.

Introduction
Related Work
Contrastive Self-Supervised Graph Learning
Generative Self-Supervised Graph Learning
The GraphMAE Approach
The GAE Problem and GraphMAE
The Design of GraphMAE
Training and Inference
Experiments
Node Classification
Graph Classification
Transfer Learning
Ablation Studies
Conclusion
Appendix
...and 6 more sections

Figures (5)

Figure 1: Comparison between generative SSL methods and the effect of GraphMAE design. AE: autoencoder methods; No Struct.: no structure reconstruction objective; Mask Feat.: use masking to corrupt input features; GNN Decoder: use GNN as the decoder; Re-mask Dec.: re-mask encoder output before fed into decoder; Space: run-time memory consumption; MSE: Mean Squared Error; SCE: Scaled Cosine Error; CE: Cross-Entropy Error; SCE represents our proposed Scaled Cosine Error.
Figure 2: Illustration of GraphMAE and the comparison with GAE. We underline the key operations in GraphMAE. During pre-training, GraphMAE first masks input node features with a mask token [MASK]. The corrupted graph is encoded into code by a GNN encoder. In the decoding, GraphMAE re-masks the code of selected nodes with another token [DMASK], and then employs a GNN, e.g., GAT, GIN, as the decoder. The output of the decoder is used to reconstruct input node features of masked nodes, with the scaled cosine error as the criterion. Previous GAEs usually use a single-layer MLP or Laplacian matrix in the decoding and focus more on restoring graph structure.
Figure 3: Ablation studies of mask ratio and scaling factor.
Figure 4: Performance on PPI using GAT with 4 attention heads, compared to other baselines. Self-supervised methods benefit much from larger model size, and GraphMAE could outperform supervised model.
Figure 5: Ablation study of mask ratio and scaling factor $\gamma$ in Ogbn-arxiv and IMDB-B.

GraphMAE: Self-Supervised Masked Graph Autoencoders

TL;DR

Abstract

GraphMAE: Self-Supervised Masked Graph Autoencoders

Authors

TL;DR

Abstract

Table of Contents

Figures (5)