Table of Contents
Fetching ...

Where to Mask: Structure-Guided Masking for Graph Masked Autoencoders

Chuang Liu, Yuyao Wang, Yibing Zhan, Xueqi Ma, Dapeng Tao, Jia Wu, Wenbin Hu

TL;DR

This work tackles suboptimal graph pre-training with Graph Masked Autoencoders by introducing StructMAE, a structure-guided masking framework. It leverages two components—Structure-based Scoring (predefined via PageRank or learnable via a scoring network) and Structure-guided Masking that progressively masks high-importance nodes—to inject graph structure priors into the masking process. Across unsupervised and transfer learning benchmarks, StructMAE consistently outperforms state-of-the-art GMAE baselines, demonstrating improved data efficiency and representation quality. The approach highlights the practical impact of incorporating structural information into self-supervised graph pre-training and opens avenues for extending structure-aware masking to edges and other graph tasks.

Abstract

Graph masked autoencoders (GMAE) have emerged as a significant advancement in self-supervised pre-training for graph-structured data. Previous GMAE models primarily utilize a straightforward random masking strategy for nodes or edges during training. However, this strategy fails to consider the varying significance of different nodes within the graph structure. In this paper, we investigate the potential of leveraging the graph's structural composition as a fundamental and unique prior in the masked pre-training process. To this end, we introduce a novel structure-guided masking strategy (i.e., StructMAE), designed to refine the existing GMAE models. StructMAE involves two steps: 1) Structure-based Scoring: Each node is evaluated and assigned a score reflecting its structural significance. Two distinct types of scoring manners are proposed: predefined and learnable scoring. 2) Structure-guided Masking: With the obtained assessment scores, we develop an easy-to-hard masking strategy that gradually increases the structural awareness of the self-supervised reconstruction task. Specifically, the strategy begins with random masking and progresses to masking structure-informative nodes based on the assessment scores. This design gradually and effectively guides the model in learning graph structural information. Furthermore, extensive experiments consistently demonstrate that our StructMAE method outperforms existing state-of-the-art GMAE models in both unsupervised and transfer learning tasks. Codes are available at https://github.com/LiuChuang0059/StructMAE.

Where to Mask: Structure-Guided Masking for Graph Masked Autoencoders

TL;DR

This work tackles suboptimal graph pre-training with Graph Masked Autoencoders by introducing StructMAE, a structure-guided masking framework. It leverages two components—Structure-based Scoring (predefined via PageRank or learnable via a scoring network) and Structure-guided Masking that progressively masks high-importance nodes—to inject graph structure priors into the masking process. Across unsupervised and transfer learning benchmarks, StructMAE consistently outperforms state-of-the-art GMAE baselines, demonstrating improved data efficiency and representation quality. The approach highlights the practical impact of incorporating structural information into self-supervised graph pre-training and opens avenues for extending structure-aware masking to edges and other graph tasks.

Abstract

Graph masked autoencoders (GMAE) have emerged as a significant advancement in self-supervised pre-training for graph-structured data. Previous GMAE models primarily utilize a straightforward random masking strategy for nodes or edges during training. However, this strategy fails to consider the varying significance of different nodes within the graph structure. In this paper, we investigate the potential of leveraging the graph's structural composition as a fundamental and unique prior in the masked pre-training process. To this end, we introduce a novel structure-guided masking strategy (i.e., StructMAE), designed to refine the existing GMAE models. StructMAE involves two steps: 1) Structure-based Scoring: Each node is evaluated and assigned a score reflecting its structural significance. Two distinct types of scoring manners are proposed: predefined and learnable scoring. 2) Structure-guided Masking: With the obtained assessment scores, we develop an easy-to-hard masking strategy that gradually increases the structural awareness of the self-supervised reconstruction task. Specifically, the strategy begins with random masking and progresses to masking structure-informative nodes based on the assessment scores. This design gradually and effectively guides the model in learning graph structural information. Furthermore, extensive experiments consistently demonstrate that our StructMAE method outperforms existing state-of-the-art GMAE models in both unsupervised and transfer learning tasks. Codes are available at https://github.com/LiuChuang0059/StructMAE.
Paper Structure (47 sections, 7 equations, 5 figures, 10 tables)

This paper contains 47 sections, 7 equations, 5 figures, 10 tables.

Figures (5)

  • Figure 1: Two primary examples that underscore the potential suboptimal nature of the random masking strategy in GMAE. a) Masked nodes are too simplistic to predict (i.e., C), hindering the acquisition of valuable knowledge. b) Masking a large number of informative chemical nodes (i.e., SO3) makes the model fail to perceive the structural information in graphs.
  • Figure 2: Overview of the proposed model. (a) The overall architecture of the proposed StructMAE. (b) SBS: It evaluates node importance based on the graph's structural information. This evaluation can be conducted using either a predefined or learnable approach. (c) SGM: It progressively increases the masking probability of important nodes as the training epochs advance.
  • Figure 3: Effects of raising the masking probability on nodes with structural information (dark red nodes in the right part). The blue dashed line illustrates the results under the random masking strategy.
  • Figure 4: Performance of StructMAE-P (Ours) with different extra probabilities. Baseline refers to GraphMAE graphmae.
  • Figure 5: Visualization of the top-ranked (Top@1) molecule, identified by molecular representation similarity, to the query molecule from ZINC15. The molecule representations are obtained from the pre-trained model in the transfer learning task.