Table of Contents
Fetching ...

Probabilistic Graph Rewiring via Virtual Nodes

Chendi Qian, Andrei Manolache, Christopher Morris, Mathias Niepert

TL;DR

This work tackles the persistent problems of under-reaching and over-squashing in message-passing GNNs by introducing implicit probabilistic rewiring through a small set of virtual nodes. An upstream model learns priors for connecting original nodes to virtual nodes under exact-$k$ constraints, and differentiable k-subset sampling enables end-to-end optimization without the quadratic costs of graph transformers. The resulting IPR-MPNN architecture provides long-range information flow with sub-quadratic complexity, theoretically surpasses standard MPNNs in expressiveness, and delivers state-of-the-art or competitive results across diverse graph and molecular benchmarks while maintaining superior efficiency. The approach offers a scalable, adaptable framework for capturing long-range dependencies in large graphs, with practical implications for chemistry, biology, and network science.

Abstract

Message-passing graph neural networks (MPNNs) have emerged as a powerful paradigm for graph-based machine learning. Despite their effectiveness, MPNNs face challenges such as under-reaching and over-squashing, where limited receptive fields and structural bottlenecks hinder information flow in the graph. While graph transformers hold promise in addressing these issues, their scalability is limited due to quadratic complexity regarding the number of nodes, rendering them impractical for larger graphs. Here, we propose implicitly rewired message-passing neural networks (IPR-MPNNs), a novel approach that integrates implicit probabilistic graph rewiring into MPNNs. By introducing a small number of virtual nodes, i.e., adding additional nodes to a given graph and connecting them to existing nodes, in a differentiable, end-to-end manner, IPR-MPNNs enable long-distance message propagation, circumventing quadratic complexity. Theoretically, we demonstrate that IPR-MPNNs surpass the expressiveness of traditional MPNNs. Empirically, we validate our approach by showcasing its ability to mitigate under-reaching and over-squashing effects, achieving state-of-the-art performance across multiple graph datasets. Notably, IPR-MPNNs outperform graph transformers while maintaining significantly faster computational efficiency.

Probabilistic Graph Rewiring via Virtual Nodes

TL;DR

This work tackles the persistent problems of under-reaching and over-squashing in message-passing GNNs by introducing implicit probabilistic rewiring through a small set of virtual nodes. An upstream model learns priors for connecting original nodes to virtual nodes under exact- constraints, and differentiable k-subset sampling enables end-to-end optimization without the quadratic costs of graph transformers. The resulting IPR-MPNN architecture provides long-range information flow with sub-quadratic complexity, theoretically surpasses standard MPNNs in expressiveness, and delivers state-of-the-art or competitive results across diverse graph and molecular benchmarks while maintaining superior efficiency. The approach offers a scalable, adaptable framework for capturing long-range dependencies in large graphs, with practical implications for chemistry, biology, and network science.

Abstract

Message-passing graph neural networks (MPNNs) have emerged as a powerful paradigm for graph-based machine learning. Despite their effectiveness, MPNNs face challenges such as under-reaching and over-squashing, where limited receptive fields and structural bottlenecks hinder information flow in the graph. While graph transformers hold promise in addressing these issues, their scalability is limited due to quadratic complexity regarding the number of nodes, rendering them impractical for larger graphs. Here, we propose implicitly rewired message-passing neural networks (IPR-MPNNs), a novel approach that integrates implicit probabilistic graph rewiring into MPNNs. By introducing a small number of virtual nodes, i.e., adding additional nodes to a given graph and connecting them to existing nodes, in a differentiable, end-to-end manner, IPR-MPNNs enable long-distance message propagation, circumventing quadratic complexity. Theoretically, we demonstrate that IPR-MPNNs surpass the expressiveness of traditional MPNNs. Empirically, we validate our approach by showcasing its ability to mitigate under-reaching and over-squashing effects, achieving state-of-the-art performance across multiple graph datasets. Notably, IPR-MPNNs outperform graph transformers while maintaining significantly faster computational efficiency.
Paper Structure (34 sections, 7 theorems, 15 equations, 7 figures, 11 tables)

This paper contains 34 sections, 7 theorems, 15 equations, 7 figures, 11 tables.

Key Result

Theorem 4.1

Let $k>0$, $\varepsilon\in(0,1)$, and $G$, $H$ be two graphs with identical $1$-WL stable colorings. Let $M$ be the set of ordered virtual nodes, $V_G$ and $V_H$ be the subset of nodes in $G$ and $H$ that have a color class of cardinality $1$, with $|V_G|=|V_H|=d$, and $W_G$, $W_H$ the subset of nod

Figures (7)

  • Figure 1: Overview of how IPR-MPNNs implicitly rewire a graph through adding virtual nodes. IPR-MPNNs use an upstream MPNN to learn priors $\bm{\theta}$ for connecting original nodes with virtual nodes via edges, parameterizing a probability mass function conditioned on exactly-$k$ constraints. Subsequently, we sample exactly $k$ edges from this distribution for each original node, connecting it to $k$ virtual nodes. We input the resulting graph to a downstream model, typically an MPNN, for the final predictions task, propagating information from (1) original nodes to virtual nodes, (2) among virtual nodes, and (3) among original nodes. On the backward pass, the gradients of the loss $\ell$ regarding the parameters $\bm{\theta}$ are approximated through the derivative of the exactly-$k$ marginals.
  • Figure 2: Comparing model sensitivity across different layers for the two most distant nodes from graphs from the Zinc dataset. On the left, we compare the sensitivity for models with a varying number of layers. We can observe that IPR-MPNNs maintain a high sensitivity even for the last layer, while the base models have the sensitivity decaying to $0$. On the right, we compare models with a different number of virtual nodes, observing that the results are similar for all of the variants.
  • Figure 3: We compute the log of total effective resistance black2023understanding of five molecular datasets before and after rewiring the graphs using virtual nodes. Our rewiring technique consistently lowers the total effective resistance, indicating a better information flow on all of the datasets.
  • Figure A4: IPR-MPNN obtains perfect accuracy on Trees-NeighborsMatchAlon2020 for a depth up to 6.
  • Figure A5: Possible new configurations.
  • ...and 2 more figures

Theorems & Definitions (11)

  • Theorem 4.1
  • Corollary 4.1.1
  • Lemma E.1
  • proof
  • Lemma E.2
  • proof
  • Lemma E.3: qian2023probabilisticallyMor+2019
  • Theorem E.4
  • proof
  • Corollary E.4.1
  • ...and 1 more