Evaluating Progress in Graph Foundation Models: A Comprehensive Benchmark and New Insights

Xingtong Yu; Shenghua Ye; Ruijuan Liang; Chang Zhou; Hong Cheng; Xinming Zhang; Yuan Fang

Evaluating Progress in Graph Foundation Models: A Comprehensive Benchmark and New Insights

Xingtong Yu, Shenghua Ye, Ruijuan Liang, Chang Zhou, Hong Cheng, Xinming Zhang, Yuan Fang

TL;DR

A new benchmark is presented that jointly evaluates topic and format gaps across the full GFM pipeline, including multi-domain self-supervised pre-training and few-shot downstream adaptation, and provides a timely evaluation of recent GFMs in the rapidly evolving landscape.

Abstract

Graph foundation models (GFM) aim to acquire transferable knowledge by pre-training on diverse graphs, which can be adapted to various downstream tasks. However, domain shift in graphs is inherently two-dimensional: graphs differ not only in what they describe (topic domains) but also in how they are represented (format domains). Most existing GFM benchmarks vary only topic domains, thereby obscuring how knowledge transfers across both dimensions. We present a new benchmark that jointly evaluates topic and format gaps across the full GFM pipeline, including multi-domain self-supervised pre-training and few-shot downstream adaptation, and provides a timely evaluation of recent GFMs in the rapidly evolving landscape. Our protocol enables controlled assessment in four settings: (i) pre-training on diverse topics and formats, while adapting to unseen downstream datasets; (ii) same pre-training as in (i), while adapting to seen datasets; (iii) pre-training on a single topic domain, while adapting to other topics; (iv) pre-training on a base format, while adapting to other formats. This two-axis evaluation disentangles semantic generalization from robustness to representational shifts. We conduct extensive evaluations of eight state-of-the-art GFMs on 33 datasets spanning seven topic domains and six format domains, surfacing new empirical observations and practical insights for future research. Codes/data are available at https://github.com/smufang/GFMBenchmark.

Evaluating Progress in Graph Foundation Models: A Comprehensive Benchmark and New Insights

TL;DR

Abstract

Paper Structure (23 sections, 2 equations, 3 figures, 34 tables)

This paper contains 23 sections, 2 equations, 3 figures, 34 tables.

Introduction
Related Work
Target Models and Domain Composition
Graph Foundation Models
Domain Composition
Benchmarking Protocols
Empirical Results and Analysis
Setting I: Adapting to Unseen Datasets
Setting II: Adapting to Seen Datasets
Setting III: Adapting across Topic Domains
Setting IV: Format Domain Adaptation
Conclusions and Future Directions
GFMs Pipeline
Benchmark Pipeline
Evaluated and GFM-style Methods
...and 8 more sections

Figures (3)

Figure 1: Comparisons with Setting I, with higher values favoring Setting I more. (a) Single-topic (Setting III) vs. multi-topic pre-training (Setting I). (b) Base format (Setting IV) vs. multi-format pre-training (Setting I).
Figure 2: Graph foundation models pipeline.
Figure 3: Four evaluation settings in our benchmark.

Evaluating Progress in Graph Foundation Models: A Comprehensive Benchmark and New Insights

TL;DR

Abstract

Evaluating Progress in Graph Foundation Models: A Comprehensive Benchmark and New Insights

Authors

TL;DR

Abstract

Table of Contents

Figures (3)