Size Transferability of Graph Transformers with Convolutional Positional Encodings

Javier Porras-Valenzuela; Zhiyang Wang; Alejandro Ribeiro

Size Transferability of Graph Transformers with Convolutional Positional Encodings

Javier Porras-Valenzuela, Zhiyang Wang, Alejandro Ribeiro

TL;DR

This work studies GTs through the lens of manifold limit models for graph sequences and establishes a theoretical connection between GTs with GNN positional encodings and Manifold Neural Networks (MNNs), building on transferability results for GNNs under manifold convergence.

Abstract

Transformers have achieved remarkable success across domains, motivating the rise of Graph Transformers (GTs) as attention-based architectures for graph-structured data. A key design choice in GTs is the use of Graph Neural Network (GNN)-based positional encodings to incorporate structural information. In this work, we study GTs through the lens of manifold limit models for graph sequences and establish a theoretical connection between GTs with GNN positional encodings and Manifold Neural Networks (MNNs). Building on transferability results for GNNs under manifold convergence, we show that GTs inherit transferability guarantees from their positional encodings. In particular, GTs trained on small graphs provably generalize to larger graphs under mild assumptions. We complement our theory with extensive experiments on standard graph benchmarks, demonstrating that GTs exhibit scalable behavior on par with GNNs. To further show the efficiency in a real-world scenario, we implement GTs for shortest path distance estimation over terrains to better illustrate the efficiency of the transferable GTs. Our results provide new insights into the understanding of GTs and suggest practical directions for efficient training of GTs in large-scale settings.

Size Transferability of Graph Transformers with Convolutional Positional Encodings

TL;DR

Abstract

Paper Structure (43 sections, 10 theorems, 83 equations, 9 figures, 4 tables)

This paper contains 43 sections, 10 theorems, 83 equations, 9 figures, 4 tables.

Introduction
Related work
Transferability and Generalization via Limit Models.
Efficent training on large graphs.
Length Generalization in Transformers.
GNN-Transformer Hybrids.
Graph Transformer With RPEARL Positional Encodings
Set up.
Graph Transformer
Positional encodings with Graph Neural Networks
Transferability Analysis of Graph Transformers via a Manifold Perspective
Discrete graphs and operator limits
Transferable Functions over Manifolds
Transferable Graph Transformers
Sparse Graph Transformer (Sparse GT).
...and 28 more sections

Key Result

Theorem 1

(Point-wise Convergence of GT to MT) For any $x\in {\mathcal{M}}$, under assumptions assm:manifold_signal --- assm:transferable_pe, the pointwise output difference between a graph transformer and manifold transformer, with probability at least $1-\delta$, is bounded by where $A$ is a constant related to the geometry of ${\mathcal{M}}$, $d \geq 3$ is the intrinsic dimension of the manifold, $C_{QK

Figures (9)

Figure 1: Diagram of Graph Transformer (GT) with RPEARL Positional Encodings. The graph ${\mathbf G}$ is sampled from manifold ${\mathcal{M}}$. The graph structure is processed by RPEARL using a graph neural network. The positional encodings are added to the node features and passed to the graph transformer, which outputs node features ${\mathbf Y}$.
Figure 2: Transferability plots. For each dataset, the $x$ axis represents the train graph sizes as a proportion of the largest graph $(\alpha)$, and the $y$ axis is the test accuracy at the full-sized graph. The titles show dataset name and largest graph size.
Figure 3: Transferability heatmaps on Arxiv-year. The $x$ axis correspond to train graph sizes as a proportion of the largest graph, and the $y$ axis are test graph size fractions. The color corresponds to the test accuracy at each setting.
Figure 4: Norway transferability. (a) Visualization of the norway terrain graph in full resolution. (b) Transferability plot: $x$ axis is training graph size, $y$ axis is Test MAE on the full resolution graph ($250\times250$).
Figure 5: Transferability plots for all datasets and architectures. For each dataset, the $x$ axis represents the train graph sizes as a proportion of the largest graph $(\alpha)$, and the $y$ axis is the test accuracy at the full-sized graph. The titles show dataset name and largest graph size.
...and 4 more figures

Theorems & Definitions (19)

Definition 1: Transferable functions over Manifolds
Definition 2: Manifold Transformer (MT)
Theorem 1
Corollary 2
Corollary 3
Corollary 4
Proposition 1
Proposition 2
Lemma 5
proof
...and 9 more

Size Transferability of Graph Transformers with Convolutional Positional Encodings

TL;DR

Abstract

Size Transferability of Graph Transformers with Convolutional Positional Encodings

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (19)