Table of Contents
Fetching ...

Position: Graph Foundation Models are Already Here

Haitao Mao, Zhikai Chen, Wenzhuo Tang, Jianan Zhao, Yao Ma, Tong Zhao, Neil Shah, Mikhail Galkin, Jiliang Tang

TL;DR

The paper argues that Graph Foundation Models (GFMs) should be built around a graph vocabulary that captures invariant structures across diverse graphs, enabling transfer across tasks and domains. It surveys existing GFMs, classifies them by transferability, and explicates transferability principles across node classification, link prediction, graph classification, and cross-task settings, anchored in network analysis, expressiveness, and stability. It discusses neural scaling laws for GFMs, detailing data and model scaling, and explores the role of large-language models (LLMs) in graph contexts, including as feature encoders or predictors, while noting limitations and open questions about universal vocabulary emergence. The work emphasizes practical directions for vocabulary construction, principled architecture choices, and data strategies, and highlights broad, cross-domain applications and open challenges for the next generation of GFMs."

Abstract

Graph Foundation Models (GFMs) are emerging as a significant research topic in the graph domain, aiming to develop graph models trained on extensive and diverse data to enhance their applicability across various tasks and domains. Developing GFMs presents unique challenges over traditional Graph Neural Networks (GNNs), which are typically trained from scratch for specific tasks on particular datasets. The primary challenge in constructing GFMs lies in effectively leveraging vast and diverse graph data to achieve positive transfer. Drawing inspiration from existing foundation models in the CV and NLP domains, we propose a novel perspective for the GFM development by advocating for a ``graph vocabulary'', in which the basic transferable units underlying graphs encode the invariance on graphs. We ground the graph vocabulary construction from essential aspects including network analysis, expressiveness, and stability. Such a vocabulary perspective can potentially advance the future GFM design in line with the neural scaling laws. All relevant resources with GFM design can be found here.

Position: Graph Foundation Models are Already Here

TL;DR

The paper argues that Graph Foundation Models (GFMs) should be built around a graph vocabulary that captures invariant structures across diverse graphs, enabling transfer across tasks and domains. It surveys existing GFMs, classifies them by transferability, and explicates transferability principles across node classification, link prediction, graph classification, and cross-task settings, anchored in network analysis, expressiveness, and stability. It discusses neural scaling laws for GFMs, detailing data and model scaling, and explores the role of large-language models (LLMs) in graph contexts, including as feature encoders or predictors, while noting limitations and open questions about universal vocabulary emergence. The work emphasizes practical directions for vocabulary construction, principled architecture choices, and data strategies, and highlights broad, cross-domain applications and open challenges for the next generation of GFMs."

Abstract

Graph Foundation Models (GFMs) are emerging as a significant research topic in the graph domain, aiming to develop graph models trained on extensive and diverse data to enhance their applicability across various tasks and domains. Developing GFMs presents unique challenges over traditional Graph Neural Networks (GNNs), which are typically trained from scratch for specific tasks on particular datasets. The primary challenge in constructing GFMs lies in effectively leveraging vast and diverse graph data to achieve positive transfer. Drawing inspiration from existing foundation models in the CV and NLP domains, we propose a novel perspective for the GFM development by advocating for a ``graph vocabulary'', in which the basic transferable units underlying graphs encode the invariance on graphs. We ground the graph vocabulary construction from essential aspects including network analysis, expressiveness, and stability. Such a vocabulary perspective can potentially advance the future GFM design in line with the neural scaling laws. All relevant resources with GFM design can be found here.
Paper Structure (35 sections, 2 figures, 2 tables)

This paper contains 35 sections, 2 figures, 2 tables.

Figures (2)

  • Figure 1: In this graph, nodes $v_1$ and $v_4$ are isomorphic; links $(v_1, v_2)$ and $(v_2, v_4)$ are not isomorphic. However, vanilla GNN with the same node representations $v_1$ and $v_4$ gives the same prediction to links $(v_1, v_2)$ and $(v_2, v_4)$.
  • Figure 2: Unifying different task formulations: (a) Link view: Given the target node, node classification is converted to the link prediction between the target node and corresponding label nodes. (b) Subgraph view: Node classification (orange node) is converted to the (green) ego-graph classification. Link prediction (orange nodes) is converted to the (green) induced-subgraph classification.