Graph Transformers without Positional Encodings
Ayush Garg
TL;DR
The paper tackles injecting graph inductive biases into Graph Transformers without handcrafted positional encodings, addressing locality and connectivity in permutation-free graph data. It proposes Eigenformer, where spectrum-aware attention computes potentials from Laplacian eigenvectors $u_k$ and eigenvalues $\lambda_k$ and learns frequency importances via $\phi_2(\lambda_k)$, with final attention $\alpha$ derived from $\alpha[i,j] = \mathrm{softmax}_j\left(\phi_1\left(\sum_k \sigma_k[i,j]\phi_2(\lambda_k)\right)\right)$. Theoretical contributions prove that this mechanism can express graph connectivity matrices and is invariant to eigenvector sign and basis within degenerate eigenspaces, while empirical results on benchmarks show competitive performance against state-of-the-art Graph Transformers. This PE-free approach offers a robust, scalable way to capture local and long-range structure and reduces the need for extensive hand-designed encodings.
Abstract
Recently, Transformers for graph representation learning have become increasingly popular, achieving state-of-the-art performance on a wide-variety of graph datasets, either alone or in combination with message-passing graph neural networks (MP-GNNs). Infusing graph inductive-biases in the innately structure-agnostic transformer architecture in the form of structural or positional encodings (PEs) is key to achieving these impressive results. However, designing such encodings is tricky and disparate attempts have been made to engineer such encodings including Laplacian eigenvectors, relative random-walk probabilities (RRWP), spatial encodings, centrality encodings, edge encodings etc. In this work, we argue that such encodings may not be required at all, provided the attention mechanism itself incorporates information about the graph structure. We introduce Eigenformer, a Graph Transformer employing a novel spectrum-aware attention mechanism cognizant of the Laplacian spectrum of the graph, and empirically show that it achieves performance competetive with SOTA Graph Transformers on a number of standard GNN benchmarks. Additionally, we theoretically prove that Eigenformer can express various graph structural connectivity matrices, which is particularly essential when learning over smaller graphs.
