Table of Contents
Fetching ...

Sequential Signal Mixing Aggregation for Message Passing Graph Neural Networks

Mitchell Keren Taraday, Almog David, Chaim Baskin

TL;DR

Sequential Signal Mixing Aggregation (SSMA) addresses the limited neighbor mixing of sum-based aggregators in MPGNNs by treating neighbor features as 2D discrete signals and applying a 2D circular convolution to achieve higher-order feature mixing. Grounded in a neighbor-mixing metric and an extended DeepSets polynomial framework, SSMA yields a representational size of $m=\mathcal{O}(n^2 d)$ and can be implemented efficiently via FFT-based convolutions. Empirically, SSMA provides substantial performance gains across a range of benchmarks (TU, ZINC, OGBN, LRBG) when plugged into existing MPGNN architectures, often attaining state-of-the-art results, with robust behavior in dense neighborhoods and long-range dependencies. The work also details practical considerations, including normalization, low-rank compression, and neighbor selection, and validates its claims on both synthetic and real-world tasks, supported by public code.

Abstract

Message Passing Graph Neural Networks (MPGNNs) have emerged as the preferred method for modeling complex interactions across diverse graph entities. While the theory of such models is well understood, their aggregation module has not received sufficient attention. Sum-based aggregators have solid theoretical foundations regarding their separation capabilities. However, practitioners often prefer using more complex aggregations and mixtures of diverse aggregations. In this work, we unveil a possible explanation for this gap. We claim that sum-based aggregators fail to "mix" features belonging to distinct neighbors, preventing them from succeeding at downstream tasks. To this end, we introduce Sequential Signal Mixing Aggregation (SSMA), a novel plug-and-play aggregation for MPGNNs. SSMA treats the neighbor features as 2D discrete signals and sequentially convolves them, inherently enhancing the ability to mix features attributed to distinct neighbors. By performing extensive experiments, we show that when combining SSMA with well-established MPGNN architectures, we achieve substantial performance gains across various benchmarks, achieving new state-of-the-art results in many settings. We published our code at \url{https://almogdavid.github.io/SSMA/}

Sequential Signal Mixing Aggregation for Message Passing Graph Neural Networks

TL;DR

Sequential Signal Mixing Aggregation (SSMA) addresses the limited neighbor mixing of sum-based aggregators in MPGNNs by treating neighbor features as 2D discrete signals and applying a 2D circular convolution to achieve higher-order feature mixing. Grounded in a neighbor-mixing metric and an extended DeepSets polynomial framework, SSMA yields a representational size of and can be implemented efficiently via FFT-based convolutions. Empirically, SSMA provides substantial performance gains across a range of benchmarks (TU, ZINC, OGBN, LRBG) when plugged into existing MPGNN architectures, often attaining state-of-the-art results, with robust behavior in dense neighborhoods and long-range dependencies. The work also details practical considerations, including normalization, low-rank compression, and neighbor selection, and validates its claims on both synthetic and real-world tasks, supported by public code.

Abstract

Message Passing Graph Neural Networks (MPGNNs) have emerged as the preferred method for modeling complex interactions across diverse graph entities. While the theory of such models is well understood, their aggregation module has not received sufficient attention. Sum-based aggregators have solid theoretical foundations regarding their separation capabilities. However, practitioners often prefer using more complex aggregations and mixtures of diverse aggregations. In this work, we unveil a possible explanation for this gap. We claim that sum-based aggregators fail to "mix" features belonging to distinct neighbors, preventing them from succeeding at downstream tasks. To this end, we introduce Sequential Signal Mixing Aggregation (SSMA), a novel plug-and-play aggregation for MPGNNs. SSMA treats the neighbor features as 2D discrete signals and sequentially convolves them, inherently enhancing the ability to mix features attributed to distinct neighbors. By performing extensive experiments, we show that when combining SSMA with well-established MPGNN architectures, we achieve substantial performance gains across various benchmarks, achieving new state-of-the-art results in many settings. We published our code at \url{https://almogdavid.github.io/SSMA/}
Paper Structure (50 sections, 6 theorems, 41 equations, 7 figures, 14 tables)

This paper contains 50 sections, 6 theorems, 41 equations, 7 figures, 14 tables.

Key Result

Proposition 3.2

Let $\gamma(\{ \! \{ x_1,...,x_n \} \! \}) = \rho \left(\sum_{k=1}^n \phi(x_k)\right)$ where $\phi: \mathbb{R}^d \rightarrow \mathbb{R}^m$ is a local operator and $\rho: \mathbb{R}^m \rightarrow \mathbb{R}^d$ is a pooling operator that is continuously twice differentiable. Then, we have $\forall i \ Where $J_\phi(.)$ is the Jacobian matrix of $\phi$ and $H_{\rho^{(\ell)}}(.)$ is the Hessian matrix

Figures (7)

  • Figure 1: An efficient and provable generalization of the DeepSets polynomial to vector features.
  • Figure 2: Visualization of the higher order notion of neighbor mixing. We visualize the convolution result $h$ for $3$-dimensional features, considering $2$ neighbors $u,v$ (left) and $3$ neighbors $u,v,w$ (right). We demonstrate for each $n$-tuple matching a feature per node, the corresponding $n$-th order derivative of exactly one entry of $h$ is $1$.
  • Figure 3: Visualization of the Sequential Signal Mixing Aggregation. Left: demonstration of the aggregation stage in an off-the-shelf MPGNN layer. The goal is to create a compressed view of $t$'s incoming neighbors. Right: our proposed aggregation. We convert the neighbor features into two-dimensional discrete signals. We then apply $2$D circular convolution by applying $2$D FFT, performing pointwise multiplication and transforming back using IFFT. Finally, we compress the result back into a $d$-dimensional vector using a multi-layer perceptron as a universal compressor.
  • Figure 4: SumOfGram train and test regression $L_1$ errors for different activation functions. The sum aggregator (not dashed) performs poorly and fails to scale with the capacity of the aggregation module, even when used in conjunction with analytic activations. On the contrary, SSMA (dashed) consistently achieves low regression errors and scales well with the number of learnable parameters.
  • Figure 5: Comparison of the neighbor selection methods across different neighbor counts and MPGNN layer types on the "OGBN-Arxiv" and "Proteins" datasets.
  • ...and 2 more figures

Theorems & Definitions (11)

  • Definition 3.1
  • Proposition 3.2
  • Theorem 4.1
  • Theorem 4.2
  • proof
  • Lemma A.1
  • proof
  • Corollary A.2
  • proof
  • Proposition A.3
  • ...and 1 more