Table of Contents
Fetching ...

Text-space Graph Foundation Models: Comprehensive Benchmarks and New Insights

Zhikai Chen, Haitao Mao, Jingzhe Liu, Yu Song, Bingheng Li, Wei Jin, Bahare Fatemi, Anton Tsitsulin, Bryan Perozzi, Hui Liu, Jiliang Tang

TL;DR

This work develops a comprehensive benchmark for text-space Graph Foundation Models (GFMs) to tackle the challenge of heterogeneous graph features by mapping node attributes into a unified text space via LLM embeddings. It defines four GFM paradigms (co-training and pre-training across task configurations) and curates 20+ text-space datasets across domains (e.g., CS, E-commerce, biology) to evaluate NC, LP, and GC tasks under unified settings. Empirical results reveal that while text-space representations can mitigate negative transfer, gains depend on task-specific inductive biases and graph structure, with co-training delivering notable benefits mainly for graph-level tasks and pre-training offering mixed improvements. The study provides practical guidance and a broad, multi-domain benchmark to spur further development of versatile and scalable text-space GFMs across varied applications.

Abstract

Given the ubiquity of graph data and its applications in diverse domains, building a Graph Foundation Model (GFM) that can work well across different graphs and tasks with a unified backbone has recently garnered significant interests. A major obstacle to achieving this goal stems from the fact that graphs from different domains often exhibit diverse node features. Inspired by multi-modal models that align different modalities with natural language, the text has recently been adopted to provide a unified feature space for diverse graphs. Despite the great potential of these text-space GFMs, current research in this field is hampered by two problems. First, the absence of a comprehensive benchmark with unified problem settings hinders a clear understanding of the comparative effectiveness and practical value of different text-space GFMs. Second, there is a lack of sufficient datasets to thoroughly explore the methods' full potential and verify their effectiveness across diverse settings. To address these issues, we conduct a comprehensive benchmark providing novel text-space datasets and comprehensive evaluation under unified problem settings. Empirical results provide new insights and inspire future research directions. Our code and data are publicly available from \url{https://github.com/CurryTang/TSGFM}.

Text-space Graph Foundation Models: Comprehensive Benchmarks and New Insights

TL;DR

This work develops a comprehensive benchmark for text-space Graph Foundation Models (GFMs) to tackle the challenge of heterogeneous graph features by mapping node attributes into a unified text space via LLM embeddings. It defines four GFM paradigms (co-training and pre-training across task configurations) and curates 20+ text-space datasets across domains (e.g., CS, E-commerce, biology) to evaluate NC, LP, and GC tasks under unified settings. Empirical results reveal that while text-space representations can mitigate negative transfer, gains depend on task-specific inductive biases and graph structure, with co-training delivering notable benefits mainly for graph-level tasks and pre-training offering mixed improvements. The study provides practical guidance and a broad, multi-domain benchmark to spur further development of versatile and scalable text-space GFMs across varied applications.

Abstract

Given the ubiquity of graph data and its applications in diverse domains, building a Graph Foundation Model (GFM) that can work well across different graphs and tasks with a unified backbone has recently garnered significant interests. A major obstacle to achieving this goal stems from the fact that graphs from different domains often exhibit diverse node features. Inspired by multi-modal models that align different modalities with natural language, the text has recently been adopted to provide a unified feature space for diverse graphs. Despite the great potential of these text-space GFMs, current research in this field is hampered by two problems. First, the absence of a comprehensive benchmark with unified problem settings hinders a clear understanding of the comparative effectiveness and practical value of different text-space GFMs. Second, there is a lack of sufficient datasets to thoroughly explore the methods' full potential and verify their effectiveness across diverse settings. To address these issues, we conduct a comprehensive benchmark providing novel text-space datasets and comprehensive evaluation under unified problem settings. Empirical results provide new insights and inspire future research directions. Our code and data are publicly available from \url{https://github.com/CurryTang/TSGFM}.
Paper Structure (45 sections, 8 figures, 18 tables)

This paper contains 45 sections, 8 figures, 18 tables.

Figures (8)

  • Figure 1: We come up with four paradigms: (Co-training, task-specific), (Co-training, cross-tasks), (Pre-training, task-specific), (Pre-training, cross-tasks)
  • Figure 2: Our proposed text-space dataset covering $20$+ datasets coming from diverse domains.
  • Figure 3: Transforming attributes and labels into text space.
  • Figure 4: Comparison of different GFM and link prediction-specific models co-trained on three small-scale graphs. Hits@100 is adopted as the metric.
  • Figure 5: Our benchmark comprises three main components: (1) Diverse text-space datasets: Covering $23$ text-space datasets from diverse domains; (2) GFM building block: Implementation of mainstream techniques to build GFMs; (3) Comprehensive Evaluation: We propose four use cases to evaluate the performance of GFMs thoroughly.
  • ...and 3 more figures