Table of Contents
Fetching ...

Boosting Graph Foundation Model from Structural Perspective

Yao Cheng, Yige Zhao, Jianxiang Yu, Xiang Li

TL;DR

BooG tackles cross-domain generalization for graph data by unifying node/edge attributes and graph structures through text encodings and a super-node based aggregation. It introduces a contrastive self-supervised pre-training objective and a similarity-based downstream predictor on a unified textual representation of TAGs, with an explicit class-conditioned embedding $h(p_{ij}) = h(s_i) + \alpha \cdot c(l_j)$. Empirically, BooG achieves state-of-the-art or competitive results across node, edge, and graph tasks on seven datasets in supervised, few-shot, and zero-shot settings, while maintaining high efficiency (e.g., pre-training in minutes). This work enables robust cross-domain deployment of graph foundation models by bridging semantic attributes and varying topologies in a common latent space.

Abstract

Graph foundation models have recently attracted significant attention due to its strong generalizability. Although existing methods resort to language models to learn unified semantic representations across domains, they disregard the unique structural characteristics of graphs from different domains. To address the problem, in this paper, we boost graph foundation model from structural perspective and propose BooG. The model constructs virtual super nodes to unify structural characteristics of graph data from different domains. Specifically, the super nodes fuse the information of anchor nodes and class labels, where each anchor node captures the information of a node or a graph instance to be classified. Instead of using the raw graph structure, we connect super nodes to all nodes within their neighborhood by virtual edges. This new structure allows for effective information aggregation while unifying cross-domain structural characteristics. Additionally, we propose a novel pre-training objective based on contrastive learning, which learns more expressive representations for graph data and generalizes effectively to different domains and downstream tasks. Experimental results on various datasets and tasks demonstrate the superior performance of BooG. We provide our code and data here: https://anonymous.4open.science/r/BooG-EE42/.

Boosting Graph Foundation Model from Structural Perspective

TL;DR

BooG tackles cross-domain generalization for graph data by unifying node/edge attributes and graph structures through text encodings and a super-node based aggregation. It introduces a contrastive self-supervised pre-training objective and a similarity-based downstream predictor on a unified textual representation of TAGs, with an explicit class-conditioned embedding . Empirically, BooG achieves state-of-the-art or competitive results across node, edge, and graph tasks on seven datasets in supervised, few-shot, and zero-shot settings, while maintaining high efficiency (e.g., pre-training in minutes). This work enables robust cross-domain deployment of graph foundation models by bridging semantic attributes and varying topologies in a common latent space.

Abstract

Graph foundation models have recently attracted significant attention due to its strong generalizability. Although existing methods resort to language models to learn unified semantic representations across domains, they disregard the unique structural characteristics of graphs from different domains. To address the problem, in this paper, we boost graph foundation model from structural perspective and propose BooG. The model constructs virtual super nodes to unify structural characteristics of graph data from different domains. Specifically, the super nodes fuse the information of anchor nodes and class labels, where each anchor node captures the information of a node or a graph instance to be classified. Instead of using the raw graph structure, we connect super nodes to all nodes within their neighborhood by virtual edges. This new structure allows for effective information aggregation while unifying cross-domain structural characteristics. Additionally, we propose a novel pre-training objective based on contrastive learning, which learns more expressive representations for graph data and generalizes effectively to different domains and downstream tasks. Experimental results on various datasets and tasks demonstrate the superior performance of BooG. We provide our code and data here: https://anonymous.4open.science/r/BooG-EE42/.
Paper Structure (25 sections, 12 equations, 6 figures, 9 tables)

This paper contains 25 sections, 12 equations, 6 figures, 9 tables.

Figures (6)

  • Figure 1: The overall process of BooG. The model consists of two parts: A pre-trained model based on self-supervised manner and a downstream classifier implemented with a MLP. The model's input includes text attribute graphs and class descriptions. BooG first utilizes a pre-trained LM to unify different graph data and standardizes the input for node-level and graph-level tasks as sub-graphs. Subsequently, BooG introduces super nodes to establish a standardized aggregation mechanism that fuses rich information from neighborhoods and associated class labels. We freeze the parameters of the pre-trained model and provide the final instance representations through similarity matching. In particular, the similarity matching process can serve as zero-shot learning to predict unseen instances. For supervised learning and few-shot learning scenarios, BooG freezes the parameters of the pre-trained model and generalizes the capabilities of the pre-trained model to multiple downstream tasks by adjusting the parameters of the MLP.
  • Figure 2: The text format for graph node and class label on Pubmed.
  • Figure 3: Unifying graph structures by super nodes.
  • Figure 4: t-SNE visualization on Cora.
  • Figure 5: Ablation study.
  • ...and 1 more figures