Recurrent Distance Filtering for Graph Representation Learning

Yuhui Ding; Antonio Orvieto; Bobby He; Thomas Hofmann

Recurrent Distance Filtering for Graph Representation Learning

Yuhui Ding, Antonio Orvieto, Bobby He, Thomas Hofmann

TL;DR

GRED tackles the bottleneck of exploiting distant information in graph representation learning by combining shortest-distance based aggregation with a diagonal linear RNN, avoiding positional encodings. It introduces Graph Recurrent Encoding by Distance, which encodes the hop-sequence into a stable, long-range filter and yields competitive or superior results to graph transformers at lower cost. The authors establish injectivity of the linear RNN mapping and show GRED is more expressive than $1$-WL for $K>1$, while providing strong empirical results on benchmarks like ZINC 12K and long-range graphs. The approach demonstrates both high accuracy and training efficiency, offering a practical path toward scalable, inductive graph modeling without expensive attention mechanisms.

Abstract

Graph neural networks based on iterative one-hop message passing have been shown to struggle in harnessing the information from distant nodes effectively. Conversely, graph transformers allow each node to attend to all other nodes directly, but lack graph inductive bias and have to rely on ad-hoc positional encoding. In this paper, we propose a new architecture to reconcile these challenges. Our approach stems from the recent breakthroughs in long-range modeling provided by deep state-space models: for a given target node, our model aggregates other nodes by their shortest distances to the target and uses a linear RNN to encode the sequence of hop representations. The linear RNN is parameterized in a particular diagonal form for stable long-range signal propagation and is theoretically expressive enough to encode the neighborhood hierarchy. With no need for positional encoding, we empirically show that the performance of our model is comparable to or better than that of state-of-the-art graph transformers on various benchmarks, with a significantly reduced computational cost. Our code is open-source at https://github.com/skeletondyh/GRED.

Recurrent Distance Filtering for Graph Representation Learning

TL;DR

-WL for

, while providing strong empirical results on benchmarks like ZINC 12K and long-range graphs. The approach demonstrates both high accuracy and training efficiency, offering a practical path toward scalable, inductive graph modeling without expensive attention mechanisms.

Abstract

Paper Structure (22 sections, 3 theorems, 17 equations, 7 figures, 7 tables)

This paper contains 22 sections, 3 theorems, 17 equations, 7 figures, 7 tables.

Introduction
Related Work
Multi-hop MPNNs.
Graph transformers.
State space models and linear RNNs.
Architecture
Preliminaries.
GRED layer.
Computational complexity.
Expressiveness Analysis
Experiments
Benchmarking GNNs.
ZINC 12K.
Long Range Graph Benchmark.
Training efficiency.
...and 7 more sections

Key Result

Theorem 4.1

Let $\{{\bm{x}}_v=({\bm{x}}_{v,0}, {\bm{x}}_{v,1}, {\bm{x}}_{v,2}, \dots, {\bm{x}}_{v,K_v})~|~v \in V\}$ be a set of sequences (of different lengths $K_v\le K$) of vectors with a (possibly uncountable) set of features $\mathcal{X}\subset\mathbb{R}^d$. Consider a diagonal linear complex-valued RNN wi

Figures (7)

Figure 1: Illustration of the filtering effect on the neighborhood, induced by the linear RNN. The filter weight is determined by the eigenvalues ${\bm{\Lambda}}$ of the transition matrix and the shortest distance to the target node. We expand on this in Section \ref{['sec:arch']}.
Figure 2: (a) Sketch of the architecture. MLPs and Layer Normalization operate independently at each node or aggregated multiset. Information of the distant nodes is propagated to the target node through a linear RNN -- specifically an LRU orvieto2023resurrecting. (b) Depiction of the GRED layer operation for two different target nodes. The gray rectangular boxes indicate the application of multiset aggregation. Finally, the new representation for the target node is computed from the RNN output through an MLP.
Figure 3: Learned (complex) eigenvalues of the first GRED layer on CIFAR10 and Peptides-func.
Figure 4: Effect of $K$ on performance.
Figure 5: Performance of GRED using RNNs of different flavors.
...and 2 more figures

Theorems & Definitions (5)

Theorem 4.1: Injectivity of linear RNNs
Corollary 4.2
Corollary 4.3: Expressiveness of GRED
proof
proof : Proof

Recurrent Distance Filtering for Graph Representation Learning

TL;DR

Abstract

Recurrent Distance Filtering for Graph Representation Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (5)