Table of Contents
Fetching ...

GOFA: A Generative One-For-All Model for Joint Graph Language Modeling

Lecheng Kong, Jiarui Feng, Hao Liu, Chengsong Huang, Jiaxin Huang, Yixin Chen, Muhan Zhang

TL;DR

GOFA addresses the absence of a true graph foundation model by marrying generative language modeling with graph structure through interleaved GNN and LLM layers. It treats graphs as Text-Attribute Graphs (TAG) and trains with graph-language self-supervised tasks to enable large-scale pretraining, task fluidity, and graph-aware generation, culminating in a model capable of strong zero-shot performance across node, link, and graph tasks. The architecture uses memory-token based graph encoding via a pre-trained ICAE compressor and a Graph Language Encoder that interleaves GNNs with Transformer layers, enabling efficient, structure-aware generation. Empirical results show GOFA outperforms pure LLM prompts and several graph-LLM baselines on many zero-shot tasks while maintaining efficiency and openness to diverse prompt-driven queries, signaling a meaningful step toward practical graph foundation models with broad cross-domain applicability.

Abstract

Foundation models, such as Large Language Models (LLMs) or Large Vision Models (LVMs), have emerged as one of the most powerful tools in the respective fields. However, unlike text and image data, graph data do not have a definitive structure, posing great challenges to developing a Graph Foundation Model (GFM). For example, current attempts at designing general graph models either transform graph data into a language format for LLM-based prediction or still train a GNN model with LLM as an assistant. The former can handle unlimited tasks, while the latter captures graph structure much better -- yet, no existing work can achieve both simultaneously. In this paper, we identify three key desirable properties of a GFM: self-supervised pretraining, fluidity in tasks, and graph awareness. To account for these properties, we extend the conventional language modeling to the graph domain and propose a novel generative graph language model GOFA to solve the problem. The model interleaves randomly initialized GNN layers into a frozen pre-trained LLM so that the semantic and structural modeling abilities are organically combined. GOFA is pre-trained on newly proposed graph-level next-word prediction, question-answering, and structural tasks to obtain the above GFM properties. The pre-trained model is further fine-tuned on downstream tasks to obtain task-solving ability. The fine-tuned model is evaluated on various downstream tasks, demonstrating a strong ability to solve structural and contextual problems in zero-shot scenarios. The code is available at https://github.com/JiaruiFeng/GOFA.

GOFA: A Generative One-For-All Model for Joint Graph Language Modeling

TL;DR

GOFA addresses the absence of a true graph foundation model by marrying generative language modeling with graph structure through interleaved GNN and LLM layers. It treats graphs as Text-Attribute Graphs (TAG) and trains with graph-language self-supervised tasks to enable large-scale pretraining, task fluidity, and graph-aware generation, culminating in a model capable of strong zero-shot performance across node, link, and graph tasks. The architecture uses memory-token based graph encoding via a pre-trained ICAE compressor and a Graph Language Encoder that interleaves GNNs with Transformer layers, enabling efficient, structure-aware generation. Empirical results show GOFA outperforms pure LLM prompts and several graph-LLM baselines on many zero-shot tasks while maintaining efficiency and openness to diverse prompt-driven queries, signaling a meaningful step toward practical graph foundation models with broad cross-domain applicability.

Abstract

Foundation models, such as Large Language Models (LLMs) or Large Vision Models (LVMs), have emerged as one of the most powerful tools in the respective fields. However, unlike text and image data, graph data do not have a definitive structure, posing great challenges to developing a Graph Foundation Model (GFM). For example, current attempts at designing general graph models either transform graph data into a language format for LLM-based prediction or still train a GNN model with LLM as an assistant. The former can handle unlimited tasks, while the latter captures graph structure much better -- yet, no existing work can achieve both simultaneously. In this paper, we identify three key desirable properties of a GFM: self-supervised pretraining, fluidity in tasks, and graph awareness. To account for these properties, we extend the conventional language modeling to the graph domain and propose a novel generative graph language model GOFA to solve the problem. The model interleaves randomly initialized GNN layers into a frozen pre-trained LLM so that the semantic and structural modeling abilities are organically combined. GOFA is pre-trained on newly proposed graph-level next-word prediction, question-answering, and structural tasks to obtain the above GFM properties. The pre-trained model is further fine-tuned on downstream tasks to obtain task-solving ability. The fine-tuned model is evaluated on various downstream tasks, demonstrating a strong ability to solve structural and contextual problems in zero-shot scenarios. The code is available at https://github.com/JiaruiFeng/GOFA.
Paper Structure (36 sections, 7 equations, 7 figures, 17 tables)

This paper contains 36 sections, 7 equations, 7 figures, 17 tables.

Figures (7)

  • Figure 1: Examples of our pre-training tasks.
  • Figure 2: Task examples in TAG. Sentence completion/Next-word prediction. Orange node $v$ represents NOG.
  • Figure 3: GOFA Architecture. Text tokens of TAG's node/edges are concatenated with memory tokens to be input to Graph Language Encoder. GNN layers are interleaved into LLM Compressor layers, where memory embeddings from LLM Compressor Layer are used as node/edge features for token-level GNN message passing. Memory embedding will be used for teacher-forcing training.
  • Figure 4: Performance vs pre-training sample size.
  • Figure 5: Pre-training Tasks Ablation Study.
  • ...and 2 more figures