Polynormer: Polynomial-Expressive Graph Transformer in Linear Time
Chenhui Deng, Zichao Yue, Zhiru Zhang
TL;DR
Polynormer tackles the scalability gap in graph transformers by marrying high-degree polynomial expressivity with linear-time computation. It introduces a polynomial-expressive base model and derives permutation-equivariant local and global attention modules, assembled in a local-to-global architecture that maintains linear complexity. The approach yields a $L$-layer model that can express a polynomial of degree $2^L$, and experiments show strong performance across 13 datasets, including large-scale graphs, even without nonlinear activations (and with gains up to ~4% with activation). This work demonstrates a practical pathway to highly expressive, scalable graph transformers suitable for real-world, large graphs.
Abstract
Graph transformers (GTs) have emerged as a promising architecture that is theoretically more expressive than message-passing graph neural networks (GNNs). However, typical GT models have at least quadratic complexity and thus cannot scale to large graphs. While there are several linear GTs recently proposed, they still lag behind GNN counterparts on several popular graph datasets, which poses a critical concern on their practical expressivity. To balance the trade-off between expressivity and scalability of GTs, we propose Polynormer, a polynomial-expressive GT model with linear complexity. Polynormer is built upon a novel base model that learns a high-degree polynomial on input features. To enable the base model permutation equivariant, we integrate it with graph topology and node features separately, resulting in local and global equivariant attention models. Consequently, Polynormer adopts a linear local-to-global attention scheme to learn high-degree equivariant polynomials whose coefficients are controlled by attention scores. Polynormer has been evaluated on $13$ homophilic and heterophilic datasets, including large graphs with millions of nodes. Our extensive experiment results show that Polynormer outperforms state-of-the-art GNN and GT baselines on most datasets, even without the use of nonlinear activation functions.
