Graph Transformers without Positional Encodings

Ayush Garg

Graph Transformers without Positional Encodings

Ayush Garg

TL;DR

The paper tackles injecting graph inductive biases into Graph Transformers without handcrafted positional encodings, addressing locality and connectivity in permutation-free graph data. It proposes Eigenformer, where spectrum-aware attention computes potentials from Laplacian eigenvectors $u_k$ and eigenvalues $\lambda_k$ and learns frequency importances via $\phi_2(\lambda_k)$, with final attention $\alpha$ derived from $\alpha[i,j] = \mathrm{softmax}_j\left(\phi_1\left(\sum_k \sigma_k[i,j]\phi_2(\lambda_k)\right)\right)$. Theoretical contributions prove that this mechanism can express graph connectivity matrices and is invariant to eigenvector sign and basis within degenerate eigenspaces, while empirical results on benchmarks show competitive performance against state-of-the-art Graph Transformers. This PE-free approach offers a robust, scalable way to capture local and long-range structure and reduces the need for extensive hand-designed encodings.

Abstract

Recently, Transformers for graph representation learning have become increasingly popular, achieving state-of-the-art performance on a wide-variety of graph datasets, either alone or in combination with message-passing graph neural networks (MP-GNNs). Infusing graph inductive-biases in the innately structure-agnostic transformer architecture in the form of structural or positional encodings (PEs) is key to achieving these impressive results. However, designing such encodings is tricky and disparate attempts have been made to engineer such encodings including Laplacian eigenvectors, relative random-walk probabilities (RRWP), spatial encodings, centrality encodings, edge encodings etc. In this work, we argue that such encodings may not be required at all, provided the attention mechanism itself incorporates information about the graph structure. We introduce Eigenformer, a Graph Transformer employing a novel spectrum-aware attention mechanism cognizant of the Laplacian spectrum of the graph, and empirically show that it achieves performance competetive with SOTA Graph Transformers on a number of standard GNN benchmarks. Additionally, we theoretically prove that Eigenformer can express various graph structural connectivity matrices, which is particularly essential when learning over smaller graphs.

Graph Transformers without Positional Encodings

TL;DR

and eigenvalues

and learns frequency importances via

, with final attention

derived from

. Theoretical contributions prove that this mechanism can express graph connectivity matrices and is invariant to eigenvector sign and basis within degenerate eigenspaces, while empirical results on benchmarks show competitive performance against state-of-the-art Graph Transformers. This PE-free approach offers a robust, scalable way to capture local and long-range structure and reduces the need for extensive hand-designed encodings.

Abstract

Paper Structure (15 sections, 2 theorems, 18 equations, 5 figures, 6 tables)

This paper contains 15 sections, 2 theorems, 18 equations, 5 figures, 6 tables.

Introduction
Theoretical Motivations
Graph Laplacian and its spectrum
Spectral convolution
Model Architecture
Attention using the Laplacian spectrum
Eigenformer architecture
Experimental Results
Related Work
Conclusion
Proofs
Proposition \ref{['proposition1']}
Experiment Details
Description of Datasets
Hyperparameters

Key Result

Proposition 1

For any $n \in \mathbb{N}$, consider the adjacency matrix $A$ drawn from the set of adjacency matrices of n-node undirected graphs, $\mathbb{G}_n \subset \{0,1\}^{n \times n}$. Further, let $L_{norm} = I - D^{-\frac{1}{2}}AD^{-\frac{1}{2}} = I - A_{norm}$ be the normalized graph Laplacian of the gra for suitable functions $\phi_1$ and $\phi_2$, where $SPD[i,j]$ is the shortest path distance betwee

Figures (5)

Figure 1: Example molecule from PCQM4Mv2 dataset: Substructures are revealed by eigenvectors
Figure 2: Eigenformer Architecture
Figure 3: (Smoothed) potential $\sigma_k$ vs eigenvalue $\lambda_k$
Figure 4: k-hop neighborhood, shortest-path distance and learned attention matrices for an example graph from the ZINC dataset
Figure 5: Percentage change in MAE with decreasing number (k) of eigenvalues used

Theorems & Definitions (2)

Proposition 1
Proposition 2

Graph Transformers without Positional Encodings

TL;DR

Abstract

Graph Transformers without Positional Encodings

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (2)