Boosting Graph Foundation Model from Structural Perspective
Yao Cheng, Yige Zhao, Jianxiang Yu, Xiang Li
TL;DR
BooG tackles cross-domain generalization for graph data by unifying node/edge attributes and graph structures through text encodings and a super-node based aggregation. It introduces a contrastive self-supervised pre-training objective and a similarity-based downstream predictor on a unified textual representation of TAGs, with an explicit class-conditioned embedding $h(p_{ij}) = h(s_i) + \alpha \cdot c(l_j)$. Empirically, BooG achieves state-of-the-art or competitive results across node, edge, and graph tasks on seven datasets in supervised, few-shot, and zero-shot settings, while maintaining high efficiency (e.g., pre-training in minutes). This work enables robust cross-domain deployment of graph foundation models by bridging semantic attributes and varying topologies in a common latent space.
Abstract
Graph foundation models have recently attracted significant attention due to its strong generalizability. Although existing methods resort to language models to learn unified semantic representations across domains, they disregard the unique structural characteristics of graphs from different domains. To address the problem, in this paper, we boost graph foundation model from structural perspective and propose BooG. The model constructs virtual super nodes to unify structural characteristics of graph data from different domains. Specifically, the super nodes fuse the information of anchor nodes and class labels, where each anchor node captures the information of a node or a graph instance to be classified. Instead of using the raw graph structure, we connect super nodes to all nodes within their neighborhood by virtual edges. This new structure allows for effective information aggregation while unifying cross-domain structural characteristics. Additionally, we propose a novel pre-training objective based on contrastive learning, which learns more expressive representations for graph data and generalizes effectively to different domains and downstream tasks. Experimental results on various datasets and tasks demonstrate the superior performance of BooG. We provide our code and data here: https://anonymous.4open.science/r/BooG-EE42/.
