Graph Foundation Models: Concepts, Opportunities and Challenges
Jiawei Liu, Cheng Yang, Zhiyuan Lu, Junze Chen, Yibo Li, Mengmei Zhang, Ting Bai, Yuan Fang, Lichao Sun, Philip S. Yu, Chuan Shi
TL;DR
The paper defines Graph Foundation Models (GFMs) as pre-trained models on broad graph data that can be adapted to diverse graph tasks, introducing emergence and homogenization as core capabilities. It offers a three-path taxonomy (GNN-based, LLM-based, GNN+LLM-based) and provides a structured review of backbones, pre-training, and adaptation strategies for each path. By comparing GFMs with language foundation models, the authors identify intrinsic and extrinsic differences and outline concrete challenges in data, architectures, evaluation, and applications. The work maps future directions, including cross-domain data, novel backbones, and improved alignment between graph structure and textual knowledge, to accelerate the development of practical, trustworthy GFMs.
Abstract
Foundation models have emerged as critical components in a variety of artificial intelligence applications, and showcase significant success in natural language processing and several other domains. Meanwhile, the field of graph machine learning is witnessing a paradigm transition from shallow methods to more sophisticated deep learning approaches. The capabilities of foundation models in generalization and adaptation motivate graph machine learning researchers to discuss the potential of developing a new graph learning paradigm. This paradigm envisions models that are pre-trained on extensive graph data and can be adapted for various graph tasks. Despite this burgeoning interest, there is a noticeable lack of clear definitions and systematic analyses pertaining to this new domain. To this end, this article introduces the concept of Graph Foundation Models (GFMs), and offers an exhaustive explanation of their key characteristics and underlying technologies. We proceed to classify the existing work related to GFMs into three distinct categories, based on their dependence on graph neural networks and large language models. In addition to providing a thorough review of the current state of GFMs, this article also outlooks potential avenues for future research in this rapidly evolving domain.
