Table of Contents
Fetching ...

GrokFormer: Graph Fourier Kolmogorov-Arnold Transformers

Guoguo Ai, Guansong Pang, Hezhe Qiao, Yuan Gao, Hui Yan

TL;DR

This work tackles the limited ability of Graph Transformers to capture high-frequency graph signals caused by the low-pass bias of self-attention. It introduces GrokFormer, which uses a Graph Fourier Kolmogorov-Arnold Network (GFKAN) to learn an order- and spectrum-adaptive spectral filter $h(\lambda)$ through Fourier-series activations across a $K$-order spectrum, combined with an efficient self-attention path. The approach yields superior expressiveness and scalability, backed by theoretical results and extensive experiments on 10 node-classification datasets and 5 graph-classification datasets, where GrokFormer consistently outperforms state-of-the-art GTs and GNNs. The work provides a practical, highly expressive GT framework with a public implementation, offering significant potential for improved graph representation learning across diverse domains.

Abstract

Graph Transformers (GTs) have demonstrated remarkable performance in graph representation learning over popular graph neural networks (GNNs). However, self--attention, the core module of GTs, preserves only low-frequency signals in graph features, leading to ineffectiveness in capturing other important signals like high-frequency ones. Some recent GT models help alleviate this issue, but their flexibility and expressiveness are still limited since the filters they learn are fixed on predefined graph spectrum or spectral order. To tackle this challenge, we propose a Graph Fourier Kolmogorov-Arnold Transformer (GrokFormer), a novel GT model that learns highly expressive spectral filters with adaptive graph spectrum and spectral order through a Fourier series modeling over learnable activation functions. We demonstrate theoretically and empirically that the proposed GrokFormer filter offers better expressiveness than other spectral methods. Comprehensive experiments on 10 real-world node classification datasets across various domains, scales, and graph properties, as well as 5 graph classification datasets, show that GrokFormer outperforms state-of-the-art GTs and GNNs. Our code is available at https://github.com/GGA23/GrokFormer

GrokFormer: Graph Fourier Kolmogorov-Arnold Transformers

TL;DR

This work tackles the limited ability of Graph Transformers to capture high-frequency graph signals caused by the low-pass bias of self-attention. It introduces GrokFormer, which uses a Graph Fourier Kolmogorov-Arnold Network (GFKAN) to learn an order- and spectrum-adaptive spectral filter through Fourier-series activations across a -order spectrum, combined with an efficient self-attention path. The approach yields superior expressiveness and scalability, backed by theoretical results and extensive experiments on 10 node-classification datasets and 5 graph-classification datasets, where GrokFormer consistently outperforms state-of-the-art GTs and GNNs. The work provides a practical, highly expressive GT framework with a public implementation, offering significant potential for improved graph representation learning across diverse domains.

Abstract

Graph Transformers (GTs) have demonstrated remarkable performance in graph representation learning over popular graph neural networks (GNNs). However, self--attention, the core module of GTs, preserves only low-frequency signals in graph features, leading to ineffectiveness in capturing other important signals like high-frequency ones. Some recent GT models help alleviate this issue, but their flexibility and expressiveness are still limited since the filters they learn are fixed on predefined graph spectrum or spectral order. To tackle this challenge, we propose a Graph Fourier Kolmogorov-Arnold Transformer (GrokFormer), a novel GT model that learns highly expressive spectral filters with adaptive graph spectrum and spectral order through a Fourier series modeling over learnable activation functions. We demonstrate theoretically and empirically that the proposed GrokFormer filter offers better expressiveness than other spectral methods. Comprehensive experiments on 10 real-world node classification datasets across various domains, scales, and graph properties, as well as 5 graph classification datasets, show that GrokFormer outperforms state-of-the-art GTs and GNNs. Our code is available at https://github.com/GGA23/GrokFormer

Paper Structure

This paper contains 32 sections, 4 theorems, 24 equations, 7 figures, 11 tables.

Key Result

Proposition 4.1

Our graph filter $h(\lambda)$ is learnable in both spectral order and graph spectrum: where the spectral order $k$ is adaptively determined by coefficient $\alpha_k$ while the spectrum $\lambda$ at the specific order $k$ is adaptively determined by coefficients $a_{km}$ and $b_{km}$.

Figures (7)

  • Figure 1: (a) The frequency response range of $K$ filter bases $\{b_1(\lambda),b_2(\lambda),\cdots,b_{k=K}(\lambda)\}, k\in[1,K]$ for GrokFormer, Specformer, and polynomial filters at the spectrum $\lambda$ w.r.t. spectral order $k$, where colors represent the varying frequency components of spectrum at different orders. Polynomial filters typically have fixed bases, e.g., $\lambda, \lambda^2, \cdots, \lambda^K$, corresponding to the $K$ filter curves that capture the specific curvilinear frequencies, whereas Specformer adaptively learns the filter bases at the first-order spectrum, enabling it to capture arbitrary frequency responses in the spectrum plane of $k=1$. In contrast, our GrokFormer filter bases are capable of capturing arbitrary frequency responses across $K$ different spectral planes. (b) Low-comb filter (ground truth) and the approximated filters generated by the filters of GorkFormer and Specformer, and the Bernstein polynomial filter in BernNet.
  • Figure 2: Overview of GrokFormer. In addition to the use of self-attention to capture global information in the spatial domain, a novel Graph Fourier KAN is proposed in GrokFormer the achieve global graph modeling in the spectral domain. This design enables a strong adaptability in both spectral order and graph spectrum, offering superior expressive power in capturing diverse graph frequency signals. GrokFormer synthesizes the spatial and spectral representations by a standard summation and normalization layer, followed by a Feed-Forward Network (FFN) layer for prediction.
  • Figure 3: Filters learned by our GrokFormer on Cora and Citeseer (homophilic graphs), and Squirrel and Texas (heterophilic graphs). See Appendix \ref{['appendix_learn_filter_real']} for the other datasets.
  • Figure 4: Order adaptivity analysis results on two homophilic graphs, including Cora and Pubmed, and two heterophilic graphs including Squirrel and Chameleon. See Appendix \ref{['appendix_order_adaptive']} for the other datasets.
  • Figure 5: Illustrations of six filters and their approximations learned by our GrokFormer filter, BernNet, and Specformer.
  • ...and 2 more figures

Theorems & Definitions (8)

  • Proposition 4.1
  • Proposition 4.2
  • Proposition 4.3
  • Proposition 4.4
  • proof
  • proof
  • proof
  • proof