Position: Graph Foundation Models are Already Here

Haitao Mao; Zhikai Chen; Wenzhuo Tang; Jianan Zhao; Yao Ma; Tong Zhao; Neil Shah; Mikhail Galkin; Jiliang Tang

Position: Graph Foundation Models are Already Here

Haitao Mao, Zhikai Chen, Wenzhuo Tang, Jianan Zhao, Yao Ma, Tong Zhao, Neil Shah, Mikhail Galkin, Jiliang Tang

TL;DR

The paper argues that Graph Foundation Models (GFMs) should be built around a graph vocabulary that captures invariant structures across diverse graphs, enabling transfer across tasks and domains. It surveys existing GFMs, classifies them by transferability, and explicates transferability principles across node classification, link prediction, graph classification, and cross-task settings, anchored in network analysis, expressiveness, and stability. It discusses neural scaling laws for GFMs, detailing data and model scaling, and explores the role of large-language models (LLMs) in graph contexts, including as feature encoders or predictors, while noting limitations and open questions about universal vocabulary emergence. The work emphasizes practical directions for vocabulary construction, principled architecture choices, and data strategies, and highlights broad, cross-domain applications and open challenges for the next generation of GFMs."

Abstract

Graph Foundation Models (GFMs) are emerging as a significant research topic in the graph domain, aiming to develop graph models trained on extensive and diverse data to enhance their applicability across various tasks and domains. Developing GFMs presents unique challenges over traditional Graph Neural Networks (GNNs), which are typically trained from scratch for specific tasks on particular datasets. The primary challenge in constructing GFMs lies in effectively leveraging vast and diverse graph data to achieve positive transfer. Drawing inspiration from existing foundation models in the CV and NLP domains, we propose a novel perspective for the GFM development by advocating for a ``graph vocabulary'', in which the basic transferable units underlying graphs encode the invariance on graphs. We ground the graph vocabulary construction from essential aspects including network analysis, expressiveness, and stability. Such a vocabulary perspective can potentially advance the future GFM design in line with the neural scaling laws. All relevant resources with GFM design can be found here.

Position: Graph Foundation Models are Already Here

TL;DR

Abstract

Paper Structure (35 sections, 2 figures, 2 tables)

This paper contains 35 sections, 2 figures, 2 tables.

Introduction
Existing GFMs and Key Designs
Existing GFM Categories
The Key to A Successful GFM Design.
Graph Transferability Principles with Actionable Steps
An overview on Graph transferability principles
Transferability Principles in Node Classification
Transferability Principles in Link Prediction
Transferability Principles in Graph Classification
Transferability Principles across Tasks
Neural Scaling Law on GFMs
When Neural Scaling Law Happens
Data Scaling
Model Scaling
Leveraging Large-scale LMs for Graphs
...and 20 more sections

Figures (2)

Figure 1: In this graph, nodes $v_1$ and $v_4$ are isomorphic; links $(v_1, v_2)$ and $(v_2, v_4)$ are not isomorphic. However, vanilla GNN with the same node representations $v_1$ and $v_4$ gives the same prediction to links $(v_1, v_2)$ and $(v_2, v_4)$.
Figure 2: Unifying different task formulations: (a) Link view: Given the target node, node classification is converted to the link prediction between the target node and corresponding label nodes. (b) Subgraph view: Node classification (orange node) is converted to the (green) ego-graph classification. Link prediction (orange nodes) is converted to the (green) induced-subgraph classification.

Position: Graph Foundation Models are Already Here

TL;DR

Abstract

Position: Graph Foundation Models are Already Here

Authors

TL;DR

Abstract

Table of Contents

Figures (2)