Table of Contents
Fetching ...

ZeroG: Investigating Cross-dataset Zero-shot Transferability in Graphs

Yuhan Li, Peisong Wang, Zhixun Li, Jeffrey Xu Yu, Jia Li

TL;DR

The paper tackles the problem of cross-dataset zero-shot transfer for graph node classification by unifying node and class representations in a language-model-derived semantic space, and by enriching pre-training data through a prompt-based subgraph sampling strategy. It introduces ZeroG, which combines a LM-based unified representation, dataset-promoting prompting nodes, and a parameter-efficient LoRA pre-training regime to enable robust zero-shot generalization across heterogeneous graphs. Key contributions include a comprehensive analysis of cross-dataset zero-shot transfer, an architecture that significantly improves in-domain and cross-domain transfer on seven benchmarks, and an ablation study validating the importance of each component. The approach advances graph foundation model research, offering a publicly available codebase and demonstrating practical potential for generalizing graph reasoning without dataset-specific fine-tuning.

Abstract

With the development of foundation models such as large language models, zero-shot transfer learning has become increasingly significant. This is highlighted by the generative capabilities of NLP models like GPT-4, and the retrieval-based approaches of CV models like CLIP, both of which effectively bridge the gap between seen and unseen data. In the realm of graph learning, the continuous emergence of new graphs and the challenges of human labeling also amplify the necessity for zero-shot transfer learning, driving the exploration of approaches that can generalize across diverse graph data without necessitating dataset-specific and label-specific fine-tuning. In this study, we extend such paradigms to zero-shot transferability in graphs by introducing ZeroG, a new framework tailored to enable cross-dataset generalization. Addressing the inherent challenges such as feature misalignment, mismatched label spaces, and negative transfer, we leverage a language model to encode both node attributes and class semantics, ensuring consistent feature dimensions across datasets. We also propose a prompt-based subgraph sampling module that enriches the semantic information and structure information of extracted subgraphs using prompting nodes and neighborhood aggregation, respectively. We further adopt a lightweight fine-tuning strategy that reduces the risk of overfitting and maintains the zero-shot learning efficacy of the language model. The results underscore the effectiveness of our model in achieving significant cross-dataset zero-shot transferability, opening pathways for the development of graph foundation models. Codes and data are available at https://github.com/NineAbyss/ZeroG.

ZeroG: Investigating Cross-dataset Zero-shot Transferability in Graphs

TL;DR

The paper tackles the problem of cross-dataset zero-shot transfer for graph node classification by unifying node and class representations in a language-model-derived semantic space, and by enriching pre-training data through a prompt-based subgraph sampling strategy. It introduces ZeroG, which combines a LM-based unified representation, dataset-promoting prompting nodes, and a parameter-efficient LoRA pre-training regime to enable robust zero-shot generalization across heterogeneous graphs. Key contributions include a comprehensive analysis of cross-dataset zero-shot transfer, an architecture that significantly improves in-domain and cross-domain transfer on seven benchmarks, and an ablation study validating the importance of each component. The approach advances graph foundation model research, offering a publicly available codebase and demonstrating practical potential for generalizing graph reasoning without dataset-specific fine-tuning.

Abstract

With the development of foundation models such as large language models, zero-shot transfer learning has become increasingly significant. This is highlighted by the generative capabilities of NLP models like GPT-4, and the retrieval-based approaches of CV models like CLIP, both of which effectively bridge the gap between seen and unseen data. In the realm of graph learning, the continuous emergence of new graphs and the challenges of human labeling also amplify the necessity for zero-shot transfer learning, driving the exploration of approaches that can generalize across diverse graph data without necessitating dataset-specific and label-specific fine-tuning. In this study, we extend such paradigms to zero-shot transferability in graphs by introducing ZeroG, a new framework tailored to enable cross-dataset generalization. Addressing the inherent challenges such as feature misalignment, mismatched label spaces, and negative transfer, we leverage a language model to encode both node attributes and class semantics, ensuring consistent feature dimensions across datasets. We also propose a prompt-based subgraph sampling module that enriches the semantic information and structure information of extracted subgraphs using prompting nodes and neighborhood aggregation, respectively. We further adopt a lightweight fine-tuning strategy that reduces the risk of overfitting and maintains the zero-shot learning efficacy of the language model. The results underscore the effectiveness of our model in achieving significant cross-dataset zero-shot transferability, opening pathways for the development of graph foundation models. Codes and data are available at https://github.com/NineAbyss/ZeroG.
Paper Structure (37 sections, 4 equations, 5 figures, 5 tables)

This paper contains 37 sections, 4 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Analysis of tasks associated with the zero-shot transfer in graphs.
  • Figure 2: Our proposed pipeline ZeroG facilitates cross-dataset zero-shot node classification with three key components: (a) It uses unified graph representations to merge node and class encodings via language models. (b) It employs prompt-based subgraph sampling to create pre-training data from rich subgraphs. (c) Its upstream pre-training adopts a parameter-efficient approach to suit various datasets while preserving zero-shot abilities and preventing overfitting. Finally, we can leverage the pre-trained model to perform downstream inference on target datasets.
  • Figure 3: Hyperparameter study of iterations $\lambda$ and the number of hops $k$.
  • Figure 4: Efficiency Analysis of ZeroG.
  • Figure 5: Embedding visualization of Cora. Circles ($\bullet$) represent nodes, while stars ($\star$) represent classes.