Table of Contents
Fetching ...

Sketch-Augmented Features Improve Learning Long-Range Dependencies in Graph Neural Networks

Ryien Hosseini, Filippo Simini, Venkatram Vishwanath, Rebecca Willett, Henry Hoffmann

TL;DR

This work tackles three key limitations of standard graph neural networks: oversquashing, oversmoothing, and limited expressiveness. It introduces Sketched Random Features (SRF), which embeds node features into a random kernel space and applies a Johnson–Lindenstrauss-style sketch to generate global, distance-preserving node representations that are concatenated to local activations in every GNN layer. The authors provide theoretical properties showing unbiased cross-terms, distance preservation, cross-node information flow, and permutation equivariance in expectation, alongside practical complexity benefits. Empirically, SRF improves performance on synthetic benchmarks and real-world tasks, including social networks, molecular OOD generalization, and long-range peptide interactions, while remaining complementary to existing positional encodings. Overall, SRF offers a scalable, architecture-agnostic enhancement that strengthens non-local reasoning in MPGNNs with minimal overhead.

Abstract

Graph Neural Networks learn on graph-structured data by iteratively aggregating local neighborhood information. While this local message passing paradigm imparts a powerful inductive bias and exploits graph sparsity, it also yields three key challenges: (i) oversquashing of long-range information, (ii) oversmoothing of node representations, and (iii) limited expressive power. In this work we inject randomized global embeddings of node features, which we term \textit{Sketched Random Features}, into standard GNNs, enabling them to efficiently capture long-range dependencies. The embeddings are unique, distance-sensitive, and topology-agnostic -- properties which we analytically and empirically show alleviate the aforementioned limitations when injected into GNNs. Experimental results on real-world graph learning tasks confirm that this strategy consistently improves performance over baseline GNNs, offering both a standalone solution and a complementary enhancement to existing techniques such as graph positional encodings. Our source code is available at \href{https://github.com/ryienh/sketched-random-features}{https://github.com/ryienh/sketched-random-features}.

Sketch-Augmented Features Improve Learning Long-Range Dependencies in Graph Neural Networks

TL;DR

This work tackles three key limitations of standard graph neural networks: oversquashing, oversmoothing, and limited expressiveness. It introduces Sketched Random Features (SRF), which embeds node features into a random kernel space and applies a Johnson–Lindenstrauss-style sketch to generate global, distance-preserving node representations that are concatenated to local activations in every GNN layer. The authors provide theoretical properties showing unbiased cross-terms, distance preservation, cross-node information flow, and permutation equivariance in expectation, alongside practical complexity benefits. Empirically, SRF improves performance on synthetic benchmarks and real-world tasks, including social networks, molecular OOD generalization, and long-range peptide interactions, while remaining complementary to existing positional encodings. Overall, SRF offers a scalable, architecture-agnostic enhancement that strengthens non-local reasoning in MPGNNs with minimal overhead.

Abstract

Graph Neural Networks learn on graph-structured data by iteratively aggregating local neighborhood information. While this local message passing paradigm imparts a powerful inductive bias and exploits graph sparsity, it also yields three key challenges: (i) oversquashing of long-range information, (ii) oversmoothing of node representations, and (iii) limited expressive power. In this work we inject randomized global embeddings of node features, which we term \textit{Sketched Random Features}, into standard GNNs, enabling them to efficiently capture long-range dependencies. The embeddings are unique, distance-sensitive, and topology-agnostic -- properties which we analytically and empirically show alleviate the aforementioned limitations when injected into GNNs. Experimental results on real-world graph learning tasks confirm that this strategy consistently improves performance over baseline GNNs, offering both a standalone solution and a complementary enhancement to existing techniques such as graph positional encodings. Our source code is available at \href{https://github.com/ryienh/sketched-random-features}{https://github.com/ryienh/sketched-random-features}.

Paper Structure

This paper contains 53 sections, 5 theorems, 21 equations, 5 figures, 6 tables, 1 algorithm.

Key Result

Proposition 3.1

SRF provides an unbiased estimation of cross-terms in the Kernel matrix. Specifically, for any distinct nodes $i$ and $j$, the cross-term of the Gram matrix is unbiased: $\mathbb{E}_{X, G}[\mathbf{z}_i^\top \mathbf{z}_j] = \kappa(\mathbf{x}_i, \mathbf{x}_j)$.

Figures (5)

  • Figure 1: Analysis of oversquashing and oversmoothing across model variants. (a) Accuracy vs. graph radius showing the effect of oversquashing across methods. (b) Dirichlet energy across layers for different methods. (c) Dirichlet energy across for $\mathcal{E}_{\mathcal{L}}$ with varying $k$. Lower Dirichlet energy indicates more oversmoothing.
  • Figure 2: Training efficiency comparison of SRF (ours), R-PEARL, and B-PEARL. (a) Runtime in seconds. (b) Memory usage in MB (log scale).
  • Figure 3: Block diagram visualizing the SRF method defined in Algorithm \ref{['algo:main']}. SRF is computed once (top, blue) and then concatenated to node states at every GNN layer during training (bottom, green).
  • Figure 4: Hyperparameter sensitivity analysis on REDDIT-M using $(\mathcal{E}_{\text{linear}}, S_{AG}^{(k)})$. (a) Performance (accuracy $\%$) vs. projection count $k$ with $D=64$ fixed. (b) Performance (accuracy $\%$) vs. total sketch dimension $k \cdot D$ with $k=8$ fixed.
  • Figure : Sketched Feature GNN

Theorems & Definitions (9)

  • Proposition 3.1: Unbiased Cross-Terms in the Kernel Matrix
  • proof
  • Proposition 3.2: Kernel Distance Sensitivity
  • proof
  • Proposition 3.3: Cross-Node Information
  • Proposition 3.4: Almost Sure Uniqueness
  • proof
  • Proposition 3.5: Permutation Equivariance in Expectation
  • proof