AnchorGT: Efficient and Flexible Attention Architecture for Scalable Graph Transformers

Wenhao Zhu; Guojie Song; Liang Wang; Shaoguo Liu

AnchorGT: Efficient and Flexible Attention Architecture for Scalable Graph Transformers

Wenhao Zhu, Guojie Song, Liang Wang, Shaoguo Liu

TL;DR

AnchorGT, a novel attention architecture for GTs with global receptive field and almost linear complexity, which serves as a flexible building block to improve the scalability of a wide range of GT models and theoretically proves that AnchorGT attention can be strictly more expressive than Weisfeiler-Lehman test, showing its superiority in representing graph structures.

Abstract

Graph Transformers (GTs) have significantly advanced the field of graph representation learning by overcoming the limitations of message-passing graph neural networks (GNNs) and demonstrating promising performance and expressive power. However, the quadratic complexity of self-attention mechanism in GTs has limited their scalability, and previous approaches to address this issue often suffer from expressiveness degradation or lack of versatility. To address this issue, we propose AnchorGT, a novel attention architecture for GTs with global receptive field and almost linear complexity, which serves as a flexible building block to improve the scalability of a wide range of GT models. Inspired by anchor-based GNNs, we employ structurally important $k$-dominating node set as anchors and design an attention mechanism that focuses on the relationship between individual nodes and anchors, while retaining the global receptive field for all nodes. With its intuitive design, AnchorGT can easily replace the attention module in various GT models with different network architectures and structural encodings, resulting in reduced computational overhead without sacrificing performance. In addition, we theoretically prove that AnchorGT attention can be strictly more expressive than Weisfeiler-Lehman test, showing its superiority in representing graph structures. Our experiments on three state-of-the-art GT models demonstrate that their AnchorGT variants can achieve better results while being faster and significantly more memory efficient.

AnchorGT: Efficient and Flexible Attention Architecture for Scalable Graph Transformers

TL;DR

Abstract

-dominating node set as anchors and design an attention mechanism that focuses on the relationship between individual nodes and anchors, while retaining the global receptive field for all nodes. With its intuitive design, AnchorGT can easily replace the attention module in various GT models with different network architectures and structural encodings, resulting in reduced computational overhead without sacrificing performance. In addition, we theoretically prove that AnchorGT attention can be strictly more expressive than Weisfeiler-Lehman test, showing its superiority in representing graph structures. Our experiments on three state-of-the-art GT models demonstrate that their AnchorGT variants can achieve better results while being faster and significantly more memory efficient.

Paper Structure (23 sections, 7 equations, 5 figures, 4 tables, 1 algorithm)

This paper contains 23 sections, 7 equations, 5 figures, 4 tables, 1 algorithm.

Introduction
Related Work
Graph Transformer
Anchor-based Graph Neural Network
Proposed Approach
Notations
$k-$Dominating Set Anchor
Anchor-based Attention
Structural Encoding
Anchor-based Attention with Sampling-based Graph Transformers
Expressiveness of AnchorGT
Experiments
Datasets and Experimental Settings
AnchorGT Achieves Full Attention Performance
AnchorGT is Fast and Memory Efficient
...and 8 more sections

Figures (5)

Figure 1: An illustration of the proposed AnchorGT.
Figure 2: GPU Memory Consumption
Figure 3: Time of Each Training Epoch
Figure 5: Relative Performance and Memory Cost ($\%$) of GraphGPS-AnchorGT-SPD on qm9 dataset. The settings stay the same with Table \ref{['tbl2']}.
Figure 6: Two graphs in the proof for Fact \ref{['fact2']}.

Theorems & Definitions (6)

Definition 3.1: $k-$dominating set
Definition 4.1: Neighbor-Distinguishable Structural Encoding
Definition 4.2: Anchor-Distinguishable Structural Encoding
Definition A.1: Discriminative Power of Randomized Graph Models
proof
proof

AnchorGT: Efficient and Flexible Attention Architecture for Scalable Graph Transformers

TL;DR

Abstract

AnchorGT: Efficient and Flexible Attention Architecture for Scalable Graph Transformers

Authors

TL;DR

Abstract

Table of Contents

Figures (5)

Theorems & Definitions (6)