Graph Transformers Dream of Electric Flow
Xiang Cheng, Lawrence Carin, Suvrit Sra
TL;DR
This work analyzes how a linear Transformer processing graph data via the incidence matrix can implement fundamental Laplacian-based algorithms. It provides explicit weight configurations to realize electric flow (and hence ${ ext{L}}^ op{}^rac12$ and related operators), the heat kernel, a multiplicative polynomial expansion, and subspace iteration for computing eigenvectors, with rigorous layer-dependent error bounds. The authors also introduce a parameter-efficient variant and demonstrate that a Transformer can learn useful positional encodings for molecular regression tasks, sometimes outperforming Laplacian-based encodings. Empirical results on synthetic graphs and real-world molecular datasets corroborate the theory, showing that a few layers suffice to approximate these linear-algebraic targets and that learned PEs can improve downstream performance.
Abstract
We show theoretically and empirically that the linear Transformer, when applied to graph data, can implement algorithms that solve canonical problems such as electric flow and eigenvector decomposition. The Transformer has access to information on the input graph only via the graph's incidence matrix. We present explicit weight configurations for implementing each algorithm, and we bound the constructed Transformers' errors by the errors of the underlying algorithms. Our theoretical findings are corroborated by experiments on synthetic data. Additionally, on a real-world molecular regression task, we observe that the linear Transformer is capable of learning a more effective positional encoding than the default one based on Laplacian eigenvectors. Our work is an initial step towards elucidating the inner-workings of the Transformer for graph data. Code is available at https://github.com/chengxiang/LinearGraphTransformer
