Generalizing Graph Transformers Across Diverse Graphs and Tasks via Pre-training
Yufei He, Zhenyu Hou, Yukuo Cen, Jun Hu, Feng He, Xu Cheng, Jie Tang, Bryan Hooi
TL;DR
PGT tackles the problem of generalizing graph pre-training across diverse, web-scale graphs by introducing a scalable transformer-based framework with Masked Graph Modeling objectives. It leverages Personalized PageRank sampling to form context sequences, uses a transformer encoder with two pre-training tasks (feature reconstruction and local structure reconstruction), and reuses a pre-trained decoder for feature augmentation during inference. Empirical results on public benchmarks and Tencent data show state-of-the-art performance and strong cross-graph transfer, including a dynamic extension (PGT-Dynamic) that surpasses specialized dynamic models. The work demonstrates practical scalability and broad applicability, suggesting a path toward universal graph foundation models for industrial and real-world domains.
Abstract
Graph pre-training has been concentrated on graph-level tasks involving small graphs (e.g., molecular graphs) or learning node representations on a fixed graph. Extending graph pre-trained models to web-scale graphs with billions of nodes in industrial scenarios, while avoiding negative transfer across graphs or tasks, remains a challenge. We aim to develop a general graph pre-trained model with inductive ability that can make predictions for unseen new nodes and even new graphs. In this work, we introduce a scalable transformer-based graph pre-training framework called PGT (Pre-trained Graph Transformer). Based on the masked autoencoder architecture, we design two pre-training tasks: one for reconstructing node features and the other for reconstructing local structures. Unlike the original autoencoder architecture where the pre-trained decoder is discarded, we propose a novel strategy that utilizes the decoder for feature augmentation. Our framework, tested on the publicly available ogbn-papers100M dataset with 111 million nodes and 1.6 billion edges, achieves state-of-the-art performance, showcasing scalability and efficiency. We have deployed our framework on Tencent's online game data, confirming its capability to pre-train on real-world graphs with over 540 million nodes and 12 billion edges and to generalize effectively across diverse static and dynamic downstream tasks.
