GIST: Gauge-Invariant Spectral Transformers for Scalable Graph Neural Operators

Mattia Rigotti; Nicholas Thumiger; Thomas Frick

GIST: Gauge-Invariant Spectral Transformers for Scalable Graph Neural Operators

Mattia Rigotti, Nicholas Thumiger, Thomas Frick

Abstract

Adapting transformer positional encoding to meshes and graph-structured data presents significant computational challenges: exact spectral methods require cubic-complexity eigendecomposition and can inadvertently break gauge invariance through numerical solver artifacts, while efficient approximate methods sacrifice gauge symmetry by design. Both failure modes cause catastrophic generalization in inductive learning, where models trained with one set of numerical choices fail when encountering different spectral decompositions of similar graphs or discretizations of the same mesh. We propose GIST (Gauge-Invariant Spectral Transformers), a new graph transformer architecture that resolves this challenge by achieving end-to-end $\mathcal{O}(N)$ complexity through random projections while algorithmically preserving gauge invariance via inner-product-based attention on the projected embeddings. We prove GIST achieves discretization-invariant learning with bounded mismatch error, enabling parameter transfer across arbitrary mesh resolutions for neural operator applications. Empirically, GIST matches state-of-the-art on standard graph benchmarks (e.g., achieving 99.50% micro-F1 on PPI) while uniquely scaling to mesh-based Neural Operator benchmarks with up to 750K nodes, achieving state-of-the-art aerodynamic prediction on the challenging DrivAerNet and DrivAerNet++ datasets.

GIST: Gauge-Invariant Spectral Transformers for Scalable Graph Neural Operators

Abstract

complexity through random projections while algorithmically preserving gauge invariance via inner-product-based attention on the projected embeddings. We prove GIST achieves discretization-invariant learning with bounded mismatch error, enabling parameter transfer across arbitrary mesh resolutions for neural operator applications. Empirically, GIST matches state-of-the-art on standard graph benchmarks (e.g., achieving 99.50% micro-F1 on PPI) while uniquely scaling to mesh-based Neural Operator benchmarks with up to 750K nodes, achieving state-of-the-art aerodynamic prediction on the challenging DrivAerNet and DrivAerNet++ datasets.

Paper Structure (42 sections, 5 theorems, 10 equations, 3 figures, 5 tables, 3 algorithms)

This paper contains 42 sections, 5 theorems, 10 equations, 3 figures, 5 tables, 3 algorithms.

Introduction
Two Barriers to Scalable Graph Transformers.
The Gauge Invariance Challenge.
Our Contribution: GIST.
Related Works
Graph Transformers.
Scalable Attention Architectures.
Neural Operators.
Positioning GIST.
Approach
Preliminaries
Self-attention and positional encoding.
Graph Laplacian and spectral embeddings.
Approximate spectral embeddings and the gauge invariance problem.
Motivation for gauge-invariant operations.
...and 27 more sections

Key Result

Proposition 1

Outputs of GIST applied to different discretizations of the same $m$-dimensional manifold, obtained by random sampling nodes, converge to each other with error $\mathcal{O}(n^{-1/(m+4)})$ where $n$ is the coarser resolution. This ensures learned parameters transfer across arbitrary mesh resolutions

Figures (3)

Figure 1: Gauge-Invariant Spectral Transformer. Left: Gauge-Invariant Spectral Self-Attention operates on graph positional embeddings $\tilde{\phi}$ as queries and keys, and node features $x$ as values. The output of the self-attention operation is then combined with $x$ through a residual connection. Limiting $\tilde{\phi}$ to queries and keys preserves gauge invariance across the self-attention block. Right: Gauge-Invariant Spectral Self-Attention is embedded in a Multi-Scale Gauge-Invariant Spectral Transformer Block which comprises 3 parallel branches inspired by EfficientViT.
Figure 2: Sensitivity study of GIST spectral embeddings parameters. The plots show the final test accuracy of a two-block Gauge-Invariant Spectral Self-Attention linear transformer trained on Cora while sweeping over the power iteration parameter $k$ with $r=256$ (left panel), and sweeping over the embedding dimension $r$ with $k=32$ (right panel). Test accuracy is fairly robust around the best value of either parameter. As expected, $r$ is monotonically related to higher performance, as higher $r$ correspond to better approximations of the eigenmaps. Accuracy conveniently saturate relatively fast, justifying the use of reasonably low $r$. The plots show mean test accuracy averaged across 10 seeds and corresponding standard deviation as error bars.
Figure 3: Scalability study of GIST. All experiments use a fixed 3-layer model while varying the hidden dimensionality. VRAM consumption was measured as a function of the number of nodes in the input graph. Graph sizes were controlled using random node dropout applied to samples from the Drivaernet dataset, enabling a systematic evaluation of memory scaling behavior.

Theorems & Definitions (8)

Proposition : Informal
Proposition 1.1: Full Statement
Proposition 1.2
proof
Proposition 1.3
proof
Proposition 1.4
proof

GIST: Gauge-Invariant Spectral Transformers for Scalable Graph Neural Operators

Abstract

GIST: Gauge-Invariant Spectral Transformers for Scalable Graph Neural Operators

Authors

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (8)