Table of Contents
Fetching ...

Scalable Message Passing Neural Networks: No Need for Attention in Large Graph Representation Learning

Haitz Sáez de Ocáriz Borde, Artem Lukoianov, Anastasis Kratsios, Michael Bronstein, Xiaowen Dong

TL;DR

This work proposes Scalable Message Passing Neural Networks (SMPNNs) and demonstrates that, by integrating standard convolutional message passing into a Pre-Layer Normalization Transformer-style block instead of attention, this modification yields high-performing deep message-passing-based Graph Neural Networks (GNNs).

Abstract

We propose Scalable Message Passing Neural Networks (SMPNNs) and demonstrate that, by integrating standard convolutional message passing into a Pre-Layer Normalization Transformer-style block instead of attention, we can produce high-performing deep message-passing-based Graph Neural Networks (GNNs). This modification yields results competitive with the state-of-the-art in large graph transductive learning, particularly outperforming the best Graph Transformers in the literature, without requiring the otherwise computationally and memory-expensive attention mechanism. Our architecture not only scales to large graphs but also makes it possible to construct deep message-passing networks, unlike simple GNNs, which have traditionally been constrained to shallow architectures due to oversmoothing. Moreover, we provide a new theoretical analysis of oversmoothing based on universal approximation which we use to motivate SMPNNs. We show that in the context of graph convolutions, residual connections are necessary for maintaining the universal approximation properties of downstream learners and that removing them can lead to a loss of universality.

Scalable Message Passing Neural Networks: No Need for Attention in Large Graph Representation Learning

TL;DR

This work proposes Scalable Message Passing Neural Networks (SMPNNs) and demonstrates that, by integrating standard convolutional message passing into a Pre-Layer Normalization Transformer-style block instead of attention, this modification yields high-performing deep message-passing-based Graph Neural Networks (GNNs).

Abstract

We propose Scalable Message Passing Neural Networks (SMPNNs) and demonstrate that, by integrating standard convolutional message passing into a Pre-Layer Normalization Transformer-style block instead of attention, we can produce high-performing deep message-passing-based Graph Neural Networks (GNNs). This modification yields results competitive with the state-of-the-art in large graph transductive learning, particularly outperforming the best Graph Transformers in the literature, without requiring the otherwise computationally and memory-expensive attention mechanism. Our architecture not only scales to large graphs but also makes it possible to construct deep message-passing networks, unlike simple GNNs, which have traditionally been constrained to shallow architectures due to oversmoothing. Moreover, we provide a new theoretical analysis of oversmoothing based on universal approximation which we use to motivate SMPNNs. We show that in the context of graph convolutions, residual connections are necessary for maintaining the universal approximation properties of downstream learners and that removing them can lead to a loss of universality.

Paper Structure

This paper contains 17 sections, 7 theorems, 39 equations, 3 figures, 7 tables.

Key Result

Theorem 4.1

Let $N,D$ be positive integers with $N \ge 2$. Let $\mathcal{G}$ be a complete graph on $N$ nodes. For any weight matrix $\mathbf{W}\in \mathbb{R}^{D\times D}$, the class $\mathcal{F}_{\mathcal{G},\mathbf{W}}$ defined in equation eq:conv_wo_NOresidual is not a universal approximator in $\mathcal{C}

Figures (3)

  • Figure 1: The Scalable Message Passing Neural Network (SMPNN) architecture.Left: The full model is comprised of $\mathtt{N}$ transformer-style blocks stacked one after the other. The model also uses input and output feedforward layers to project node features to the hidden and output dimensions. Middle: Architecture of a single SMPNN block as described in Section \ref{['The Scalable Message Passing Block']}. Right: Zoom into the GCN block and the Pointwise FeedForward network with SiLU activation functions.
  • Figure 2: Max GPU consumption versus number of edges in the subgraph for SMPNN with 6 layers.
  • Figure 3: Max GPU consumption versus the number of nodes in the batch subgraph for different models with 6 layers.

Theorems & Definitions (14)

  • Definition 4.1: Graph Dirichlet Energy of a Message Passing System Zhou2005RegularizationOD
  • Definition 4.2: Graph Frequency Dominance digiovanni2023understanding
  • Definition 4.3: Universal Approximator
  • Theorem 4.1: No Universal Approximation via Graph Convolution Alone
  • Theorem 4.2: Universal Approximation via Graph Convolution is Possible with Residual Connections
  • Lemma B.2: Loss of Injectivity
  • proof : Proof of Lemma \ref{['lem:no_univ0']}
  • Lemma B.3: No Universality without Residual Connection
  • proof
  • Lemma B.4: Injectivity Regained
  • ...and 4 more