Table of Contents
Fetching ...

One Model for One Graph: A New Perspective for Pretraining with Cross-domain Graphs

Jingzhe Liu, Haitao Mao, Zhikai Chen, Bingheng Li, Wenqi Fan, Mingxuan Ju, Tong Zhao, Neil Shah, Jiliang Tang

TL;DR

This work tackles cross-domain graph generalization by proposing OMOG, a two-stage framework that trains a separate source model for each pretraining graph and uses a scoring module to select and fuse a subset of experts at inference. By encoding node attributes into a unified text space and applying non-parametric SGC, OMOG mitigates domain heterogeneity and negative transfer, while its fusion mechanism has a Bayesian averaging interpretation and theoretical guarantees. Empirically, OMOG achieves superior zero-shot and few-shot transfer on ten text-attributed graph datasets for node classification and link prediction, outperforming single-backbone and mixture baselines and offering improved efficiency. The approach offers a scalable path toward graph foundation models, enabling targeted cross-graph transfer and easy incorporation of new graphs without retraining the entire bank.

Abstract

Graph Neural Networks (GNNs) have emerged as a powerful tool to capture intricate network patterns, achieving success across different domains. However, existing GNNs require careful domain-specific architecture designs and training from scratch on each dataset, leading to an expertise-intensive process with difficulty in generalizing across graphs from different domains. Therefore, it can be hard for practitioners to infer which GNN model can generalize well to graphs from their domains. To address this challenge, we propose a novel cross-domain pretraining framework, "one model for one graph," which overcomes the limitations of previous approaches that failed to use a single GNN to capture diverse graph patterns across domains with significant gaps. Specifically, we pretrain a bank of expert models, with each one corresponding to a specific dataset. When inferring to a new graph, gating functions choose a subset of experts to effectively integrate prior model knowledge while avoiding negative transfer. Extensive experiments consistently demonstrate the superiority of our proposed method on both link prediction and node classification tasks.

One Model for One Graph: A New Perspective for Pretraining with Cross-domain Graphs

TL;DR

This work tackles cross-domain graph generalization by proposing OMOG, a two-stage framework that trains a separate source model for each pretraining graph and uses a scoring module to select and fuse a subset of experts at inference. By encoding node attributes into a unified text space and applying non-parametric SGC, OMOG mitigates domain heterogeneity and negative transfer, while its fusion mechanism has a Bayesian averaging interpretation and theoretical guarantees. Empirically, OMOG achieves superior zero-shot and few-shot transfer on ten text-attributed graph datasets for node classification and link prediction, outperforming single-backbone and mixture baselines and offering improved efficiency. The approach offers a scalable path toward graph foundation models, enabling targeted cross-graph transfer and easy incorporation of new graphs without retraining the entire bank.

Abstract

Graph Neural Networks (GNNs) have emerged as a powerful tool to capture intricate network patterns, achieving success across different domains. However, existing GNNs require careful domain-specific architecture designs and training from scratch on each dataset, leading to an expertise-intensive process with difficulty in generalizing across graphs from different domains. Therefore, it can be hard for practitioners to infer which GNN model can generalize well to graphs from their domains. To address this challenge, we propose a novel cross-domain pretraining framework, "one model for one graph," which overcomes the limitations of previous approaches that failed to use a single GNN to capture diverse graph patterns across domains with significant gaps. Specifically, we pretrain a bank of expert models, with each one corresponding to a specific dataset. When inferring to a new graph, gating functions choose a subset of experts to effectively integrate prior model knowledge while avoiding negative transfer. Extensive experiments consistently demonstrate the superiority of our proposed method on both link prediction and node classification tasks.

Paper Structure

This paper contains 18 sections, 4 theorems, 34 equations, 6 figures, 4 tables.

Key Result

lemma 1

Under Assumption 2, the KL divergence between the predictive distributions of the test graph $G_{\text{test}}$ and a source graph $G_i$ is given by: For small differences in variance ($|\sigma_i^2 - \sigma_{\text{test}}^2| \ll \sigma_{\text{test}}^2$), using the first-order Taylor expansion of $\log(1+x)$, we obtain the approximation: Since $\sigma_{\text{test}}^2$ is constant across models, the

Figures (6)

  • Figure 1: Existing "one model for all graphs" pipeline vs. the proposed "one model for one graph" pipeline .
  • Figure 2: An illustration of the pretraining stage. The first step encodes the node attributes with language models and then applies SGC to incorporate the structure information. The second step pretrains the source model with contrastive loss. The third step trains a scoring module to filter the domain-related features.
  • Figure 3: An illustration of the inference stage. We input the test graph features into each scoring model to calculate the relevance values to the corresponding source models. Then we select source models with top-k largest values and fuse them into a new model to infer the downstream tasks.
  • Figure 4: The impact for key components on OMOG
  • Figure 5: The performance of different scoring module designs.
  • ...and 1 more figures

Theorems & Definitions (6)

  • definition 1: Graph Domain and Graph Representations
  • definition 2: Relevance Score and KL Divergence Approximation
  • lemma 1: KL Divergence Approximation in Gaussian Case
  • proposition 1: Optimal Model Selection with Bayesian Model Averaging
  • theorem 3: OMOG Model Fusion Minimizes Expected Transfer Error
  • corollary 1: Mitigation of Negative Transfer