Table of Contents
Fetching ...

Scalable Expressiveness through Preprocessed Graph Perturbations

Danial Saber, Amirali Salehi-Abari

TL;DR

The paper tackles the expressivity–scalability gap in graph neural networks by introducing SE2P, a framework that precomputes feature diffusion over multiple perturbed graphs and merges their representations. SE2P offers four configurations (C1–C4) that trade preprocessing-based scalability for learnable aggregations to achieve higher expressiveness, with reported speedups up to $8\times$ on real datasets and improved generalization over strong baselines. Empirical results on TU and OGB benchmarks show SE2P variants often surpass 1-WL–limited baselines while maintaining competitive runtime, and ablation studies confirm the benefit of graph perturbations for expressiveness. The approach provides a practical and flexible path to scalable, expressive GNNs, with future work aimed at exploring more perturbation policies, theoretical analysis, and adaptive perturbation budgeting.

Abstract

Graph Neural Networks (GNNs) have emerged as the predominant method for analyzing graph-structured data. However, canonical GNNs have limited expressive power and generalization capability, thus triggering the development of more expressive yet computationally intensive methods. One such approach is to create a series of perturbed versions of input graphs and then repeatedly conduct multiple message-passing operations on all variations during training. Despite their expressive power, this approach does not scale well on larger graphs. To address this scalability issue, we introduce Scalable Expressiveness through Preprocessed Graph Perturbation (SE2P). This model offers a flexible, configurable balance between scalability and generalizability with four distinct configuration classes. At one extreme, the configuration prioritizes scalability through minimal learnable feature extraction and extensive preprocessing; at the other extreme, it enhances generalizability with more learnable feature extractions, though this increases scalability costs. We conduct extensive experiments on real-world datasets to evaluate the generalizability and scalability of SE2P variants compared to various state-of-the-art benchmarks. Our results indicate that, depending on the chosen SE2P configuration, the model can enhance generalizability compared to benchmarks while achieving significant speed improvements of up to 8-fold.

Scalable Expressiveness through Preprocessed Graph Perturbations

TL;DR

The paper tackles the expressivity–scalability gap in graph neural networks by introducing SE2P, a framework that precomputes feature diffusion over multiple perturbed graphs and merges their representations. SE2P offers four configurations (C1–C4) that trade preprocessing-based scalability for learnable aggregations to achieve higher expressiveness, with reported speedups up to on real datasets and improved generalization over strong baselines. Empirical results on TU and OGB benchmarks show SE2P variants often surpass 1-WL–limited baselines while maintaining competitive runtime, and ablation studies confirm the benefit of graph perturbations for expressiveness. The approach provides a practical and flexible path to scalable, expressive GNNs, with future work aimed at exploring more perturbation policies, theoretical analysis, and adaptive perturbation budgeting.

Abstract

Graph Neural Networks (GNNs) have emerged as the predominant method for analyzing graph-structured data. However, canonical GNNs have limited expressive power and generalization capability, thus triggering the development of more expressive yet computationally intensive methods. One such approach is to create a series of perturbed versions of input graphs and then repeatedly conduct multiple message-passing operations on all variations during training. Despite their expressive power, this approach does not scale well on larger graphs. To address this scalability issue, we introduce Scalable Expressiveness through Preprocessed Graph Perturbation (SE2P). This model offers a flexible, configurable balance between scalability and generalizability with four distinct configuration classes. At one extreme, the configuration prioritizes scalability through minimal learnable feature extraction and extensive preprocessing; at the other extreme, it enhances generalizability with more learnable feature extractions, though this increases scalability costs. We conduct extensive experiments on real-world datasets to evaluate the generalizability and scalability of SE2P variants compared to various state-of-the-art benchmarks. Our results indicate that, depending on the chosen SE2P configuration, the model can enhance generalizability compared to benchmarks while achieving significant speed improvements of up to 8-fold.
Paper Structure (14 sections, 18 equations, 3 figures, 9 tables)

This paper contains 14 sections, 18 equations, 3 figures, 9 tables.

Figures (3)

  • Figure 1: $1$-WL graph isomorphism test (a) fails to distinguish between two non-isomorphic graphs $G_1$ and $G_2$, but (b) successfully detect their perturbation (through node removal) $G_1^{\prime}$ and $G_2^{\prime}$.
  • Figure 2: The SE2P framework first generates $R$ perturbations of the input graph, where each perturbation involves randomly removing some nodes, thereby resulting in new adjacency and feature matrices ${(\mathbf{A}_r,\mathbf{X}_r)}$. Next, node features are diffused for each perturbation by a set of diffusion matrices: the perturbed adjacency matrix powers. Then, the $\mathop{\mathrm{COMBINE}}\nolimits$ function combines these diffused features for each perturbed graph to produce feature matrices $\mathbf{Z}_r$. All these matrices then undergo the $\mathop{\mathrm{MERGE}}\nolimits$ function to generate a single nodal representation matrix $\mathbf{Z}$ for the input graph. Focusing on the graph-level task, we then apply $\mathop{\mathrm{POOL}}\nolimits$ to yield a graph (vector) representation $z_G$ and further achieve the predicted output through a non-linear transformation by MLP. The functions $\mathop{\mathrm{COMBINE}}\nolimits$, $\mathop{\mathrm{MERGE}}\nolimits$, and $\mathop{\mathrm{POOL}}\nolimits$ can be either non-learnable (blue circle) or learnable (red circle). This flexibility allows us to choose between different configuration classes (C1, C2, C3, and C4) to balance scalability, achieved by including more preprocessing steps (blue line), and expressivity, achieved by having more learnable components and a longer training phase (red line).
  • Figure 3: Sensitivity analysis on the TU datasets, investigating the impact of the introduced hyperparameters in each configuration ($x$-axis) versus the classification accuracy ($y$-axis). The star indicates the optimal hyperparameters. Each search combination is compared to DropGNN (=better, =comparable with a difference of 0.2%, =worse). (a) SE2P-C1 shows insensitivity to the searched hyperparameters while usually achieving weaker results than DropGNN. (b) SE2P-C2 maintains insensitivity to hyperparameters for all datasets except for the PTC-MR dataset, where sub-optimal hyperparameters show weaker results than the optimal one. (c) In SE2P-C3, sub-optimal search combinations show considerably weaker performance for IMDB-M and PTC-MR, but other datasets maintain their insensitivity to the hyperparameters, achieving comparable or better results than DropGNN. (d) SE2P-C4 shows relative robustness to varying the hyperparameters for all datasets, except for PTC-MR.