Table of Contents
Fetching ...

SAMGPT: Text-free Graph Foundation Model for Multi-domain Pre-training and Cross-domain Adaptation

Xingtong Yu, Zechuan Gong, Chang Zhou, Yuan Fang, Hui Zhang

TL;DR

SAMGPT addresses the challenge of building a universal graph foundation model from multi-domain, text-free graphs and adapting to unseen domains. It introduces per-domain structure tokens to align topology during multi-domain pre-training and a dual-prompt strategy (holistic and domain-specific prompts) for cross-domain adaptation, keeping the pre-trained encoder frozen. The method yields a unified framework that integrates feature and structure alignment, demonstrated by strong one-shot and few-shot performance across seven datasets and robustness to varying homophily. This work advances practical multi-domain graph modeling with minimal reliance on textual attributes, enabling scalable cross-domain deployment of graph foundations.

Abstract

Graphs are able to model interconnected entities in many online services, supporting a wide range of applications on the Web. This raises an important question: How can we train a graph foundational model on multiple source domains and adapt to an unseen target domain? A major obstacle is that graphs from different domains often exhibit divergent characteristics. Some studies leverage large language models to align multiple domains based on textual descriptions associated with the graphs, limiting their applicability to text-attributed graphs. For text-free graphs, a few recent works attempt to align different feature distributions across domains, while generally neglecting structural differences. In this work, we propose a novel Structure Alignment framework for text-free Multi-domain Graph Pre-Training and cross-domain adaptation (SAMGPT). It is designed to learn multi-domain knowledge from graphs originating in multiple source domains, which can then be adapted to address applications in an unseen target domain. Specifically, we introduce a set of structure tokens to harmonize structure-based aggregation across source domains during the pre-training phase. Next, for cross-domain adaptation, we design dual prompts, namely, holistic prompts and specific prompts, which adapt unified multi-domain structural knowledge and fine-grained, domain-specific information, respectively, to a target domain. Finally, we conduct comprehensive experiments on seven public datasets to evaluate and analyze the effectiveness of SAMGPT.

SAMGPT: Text-free Graph Foundation Model for Multi-domain Pre-training and Cross-domain Adaptation

TL;DR

SAMGPT addresses the challenge of building a universal graph foundation model from multi-domain, text-free graphs and adapting to unseen domains. It introduces per-domain structure tokens to align topology during multi-domain pre-training and a dual-prompt strategy (holistic and domain-specific prompts) for cross-domain adaptation, keeping the pre-trained encoder frozen. The method yields a unified framework that integrates feature and structure alignment, demonstrated by strong one-shot and few-shot performance across seven datasets and robustness to varying homophily. This work advances practical multi-domain graph modeling with minimal reliance on textual attributes, enabling scalable cross-domain deployment of graph foundations.

Abstract

Graphs are able to model interconnected entities in many online services, supporting a wide range of applications on the Web. This raises an important question: How can we train a graph foundational model on multiple source domains and adapt to an unseen target domain? A major obstacle is that graphs from different domains often exhibit divergent characteristics. Some studies leverage large language models to align multiple domains based on textual descriptions associated with the graphs, limiting their applicability to text-attributed graphs. For text-free graphs, a few recent works attempt to align different feature distributions across domains, while generally neglecting structural differences. In this work, we propose a novel Structure Alignment framework for text-free Multi-domain Graph Pre-Training and cross-domain adaptation (SAMGPT). It is designed to learn multi-domain knowledge from graphs originating in multiple source domains, which can then be adapted to address applications in an unseen target domain. Specifically, we introduce a set of structure tokens to harmonize structure-based aggregation across source domains during the pre-training phase. Next, for cross-domain adaptation, we design dual prompts, namely, holistic prompts and specific prompts, which adapt unified multi-domain structural knowledge and fine-grained, domain-specific information, respectively, to a target domain. Finally, we conduct comprehensive experiments on seven public datasets to evaluate and analyze the effectiveness of SAMGPT.

Paper Structure

This paper contains 21 sections, 13 equations, 4 figures, 6 tables, 1 algorithm.

Figures (4)

  • Figure 1: Motivation of SAMGPT.
  • Figure 2: Overall framework of SAMGPT.
  • Figure 3: Impact of number of shots on node and graph classification on four target domains.
  • Figure 4: Sensitivity study of $\alpha$ and $\beta$.