Table of Contents
Fetching ...

Topology Only Pre-Training: Towards Generalised Multi-Domain Graph Models

Alex O. Davies, Riku W. Green, Nirav S. Ajmeri, Telmo M. Silva Filho

TL;DR

ToP addresses the challenge of cross-domain graph pre-training by removing node and edge features during pre-training to focus on topology. It demonstrates that multi-domain, topology-only pre-training yields positive transfer across diverse graph domains, with significant improvements in a majority of tasks ($75\%$ of experiments, $p\leq0.01$) and strong benefits when downstream fine-tuning includes features ($85.7\%$ of tasks). Reintroducing features at transfer is supported by input-head swapping, enabling downstream use of domain features without sacrificing the topology-driven bias. The results show ToP can compete with or exceed generalist graph models, including large LLM-based approaches, while offering lower computational cost, and point to a new research direction where domain diversity in topology drives robust foundation-model-like behavior for graphs.

Abstract

The principal benefit of unsupervised representation learning is that a pre-trained model can be fine-tuned where data or labels are scarce. Existing approaches for graph representation learning are domain specific, maintaining consistent node and edge features across the pre-training and target datasets. This has precluded transfer to multiple domains. We present Topology Only Pre-Training (ToP), a graph pre-training method based on node and edge feature exclusion. We show positive transfer on evaluation datasets from multiple domains, including domains not present in pre-training data, running directly contrary to assumptions made in contemporary works. On 75% of experiments, ToP models perform significantly $p \leq 0.01$ better than a supervised baseline. Performance is significantly positive on 85.7% of tasks when node and edge features are used in fine-tuning. We further show that out-of-domain topologies can produce more useful pre-training than in-domain. Under ToP we show better transfer from non-molecule pre-training, compared to molecule pre-training, on 79% of molecular benchmarks. Against the limited set of other generalist graph models ToP performs strongly, including against models with many orders of magnitude larger. These findings show that ToP opens broad areas of research in both transfer learning on scarcely populated graph domains and in graph foundation models.

Topology Only Pre-Training: Towards Generalised Multi-Domain Graph Models

TL;DR

ToP addresses the challenge of cross-domain graph pre-training by removing node and edge features during pre-training to focus on topology. It demonstrates that multi-domain, topology-only pre-training yields positive transfer across diverse graph domains, with significant improvements in a majority of tasks ( of experiments, ) and strong benefits when downstream fine-tuning includes features ( of tasks). Reintroducing features at transfer is supported by input-head swapping, enabling downstream use of domain features without sacrificing the topology-driven bias. The results show ToP can compete with or exceed generalist graph models, including large LLM-based approaches, while offering lower computational cost, and point to a new research direction where domain diversity in topology drives robust foundation-model-like behavior for graphs.

Abstract

The principal benefit of unsupervised representation learning is that a pre-trained model can be fine-tuned where data or labels are scarce. Existing approaches for graph representation learning are domain specific, maintaining consistent node and edge features across the pre-training and target datasets. This has precluded transfer to multiple domains. We present Topology Only Pre-Training (ToP), a graph pre-training method based on node and edge feature exclusion. We show positive transfer on evaluation datasets from multiple domains, including domains not present in pre-training data, running directly contrary to assumptions made in contemporary works. On 75% of experiments, ToP models perform significantly better than a supervised baseline. Performance is significantly positive on 85.7% of tasks when node and edge features are used in fine-tuning. We further show that out-of-domain topologies can produce more useful pre-training than in-domain. Under ToP we show better transfer from non-molecule pre-training, compared to molecule pre-training, on 79% of molecular benchmarks. Against the limited set of other generalist graph models ToP performs strongly, including against models with many orders of magnitude larger. These findings show that ToP opens broad areas of research in both transfer learning on scarcely populated graph domains and in graph foundation models.
Paper Structure (53 sections, 1 equation, 9 figures, 14 tables)

This paper contains 53 sections, 1 equation, 9 figures, 14 tables.

Figures (9)

  • Figure 1: Graph data has rich, highly varied feature sets between domains. Here we provide model examples from social media (left) and chemistry (right). Social media feature sets might include a combination of textual, image and numeric types. Chemical feature sets are primarily numeric and categorical, representing properties of component atoms that might be difficult to infer for deep-learning models.
  • Figure 2: A schematic for ToP models. During pre-training, features are replaced with a single integer label for each node or edge (all identical), which are then passed to a single-layer MLP. This results in an identical input vector for each node and edge. During transfer, a different input head is used, moving the original dimensions to the hidden dimensionality of the GNN encoder block. This allows arbitrary node and edge features to be included in transfer. Omitting the output head of the model allows transfer onto node and edge-level tasks.
  • Figure 3: UMAP embeddings of encodings from each model, as well as an untrained GIN model. Triangular markers show molecular graphs, and circles non-molecules. We plot the centroid of each dataset for added clarity. Qualitatively the untrained and ToP-Chem models are noticeably more fragmented than the ToP-Social and ToP-All models. In turn the ToP-Social model is more fragmented than the ToP-All model.
  • Figure 4: A UMAP embedding of encodings from an untrained encoder.
  • Figure 5: A UMAP embedding of encodings from the ToP-Chem model, pre-trained on only molecules.
  • ...and 4 more figures