Table of Contents
Fetching ...

Message-Passing State-Space Models: Improving Graph Learning with Modern Sequence Modeling

Andrea Ceni, Alessio Gravina, Claudio Gallicchio, Davide Bacciu, Carola-Bibiane Schonlieb, Moshe Eliasof

TL;DR

MP-SSM introduces a principled integration of modern state-space modeling into the message-passing framework to enable stable, long-range information propagation on both static and temporal graphs while preserving permutation equivariance. The core idea is a linear diffusion on graphs via a recurrence X_{t+1} = A X_t W + U_{t+1} B, followed by a graph-agnostic MLP, with deep stacking yielding large effective receptive fields without nonlinear diffusion leverage. The authors provide exact Jacobian-based sensitivity analysis, derive lower bounds on gradient flow, and show MP-SSM mitigates oversquashing and vanishing gradients in deep regimes, supported by a fast parallel implementation. Empirically, MP-SSM achieves state-of-the-art or strong performance across long-range propagation, heterophilic, and spatio-temporal forecasting benchmarks, while maintaining runtimes comparable to standard GCNs. These results demonstrate MP-SSM as a versatile and scalable framework for graph learning with rigorous theoretical grounding and broad applicability.

Abstract

The recent success of State-Space Models (SSMs) in sequence modeling has motivated their adaptation to graph learning, giving rise to Graph State-Space Models (GSSMs). However, existing GSSMs operate by applying SSM modules to sequences extracted from graphs, often compromising core properties such as permutation equivariance, message-passing compatibility, and computational efficiency. In this paper, we introduce a new perspective by embedding the key principles of modern SSM computation directly into the Message-Passing Neural Network framework, resulting in a unified methodology for both static and temporal graphs. Our approach, MP-SSM, enables efficient, permutation-equivariant, and long-range information propagation while preserving the architectural simplicity of message passing. Crucially, MP-SSM enables an exact sensitivity analysis, which we use to theoretically characterize information flow and evaluate issues like vanishing gradients and over-squashing in the deep regime. Furthermore, our design choices allow for a highly optimized parallel implementation akin to modern SSMs. We validate MP-SSM across a wide range of tasks, including node classification, graph property prediction, long-range benchmarks, and spatiotemporal forecasting, demonstrating both its versatility and strong empirical performance.

Message-Passing State-Space Models: Improving Graph Learning with Modern Sequence Modeling

TL;DR

MP-SSM introduces a principled integration of modern state-space modeling into the message-passing framework to enable stable, long-range information propagation on both static and temporal graphs while preserving permutation equivariance. The core idea is a linear diffusion on graphs via a recurrence X_{t+1} = A X_t W + U_{t+1} B, followed by a graph-agnostic MLP, with deep stacking yielding large effective receptive fields without nonlinear diffusion leverage. The authors provide exact Jacobian-based sensitivity analysis, derive lower bounds on gradient flow, and show MP-SSM mitigates oversquashing and vanishing gradients in deep regimes, supported by a fast parallel implementation. Empirically, MP-SSM achieves state-of-the-art or strong performance across long-range propagation, heterophilic, and spatio-temporal forecasting benchmarks, while maintaining runtimes comparable to standard GCNs. These results demonstrate MP-SSM as a versatile and scalable framework for graph learning with rigorous theoretical grounding and broad applicability.

Abstract

The recent success of State-Space Models (SSMs) in sequence modeling has motivated their adaptation to graph learning, giving rise to Graph State-Space Models (GSSMs). However, existing GSSMs operate by applying SSM modules to sequences extracted from graphs, often compromising core properties such as permutation equivariance, message-passing compatibility, and computational efficiency. In this paper, we introduce a new perspective by embedding the key principles of modern SSM computation directly into the Message-Passing Neural Network framework, resulting in a unified methodology for both static and temporal graphs. Our approach, MP-SSM, enables efficient, permutation-equivariant, and long-range information propagation while preserving the architectural simplicity of message passing. Crucially, MP-SSM enables an exact sensitivity analysis, which we use to theoretically characterize information flow and evaluate issues like vanishing gradients and over-squashing in the deep regime. Furthermore, our design choices allow for a highly optimized parallel implementation akin to modern SSMs. We validate MP-SSM across a wide range of tasks, including node classification, graph property prediction, long-range benchmarks, and spatiotemporal forecasting, demonstrating both its versatility and strong empirical performance.

Paper Structure

This paper contains 43 sections, 6 theorems, 40 equations, 3 figures, 12 tables, 1 algorithm.

Key Result

Theorem 3.4

The Jacobian of the linear recurrent equation of an MP-SSM block, from node $j$ at layer $s$ to node $i$ at layer $t\geq s$, can be computed exactly, and it has the following form:

Figures (3)

  • Figure 1: Illustration of our MP-SSM for temporal and static cases, considering a recurrence time $k+1=3$. The temporal case (left) incorporates dynamic updates to node embeddings over time steps, represented as $\mathbf{U}=[\mathbf{U}_1, \mathbf{U}_2, \mathbf{U}_3]$, while the static case (right) uses fixed node embeddings $\mathbf{U}=[\mathbf{U}_1, \mathbf{U}_1, \mathbf{U}_1]$. An MP-SSM block comprises a linear recurrence followed by a multilayer perceptron (MLP). Multiple MP-SSM blocks are stacked to construct a deep MP-SSM architecture.
  • Figure 2: A chain of six cliques (containing ten nodes each) connected via bridge-nodes of degree 2. The pair of red nodes is a pair of nodes that minimizes the quantity in \ref{['eq:approx_jacobian']}. Note that the red nodes are $12$ hops apart, so it can be considered long-term.
  • Figure 3: Inference time on a graph of $n=100$ nodes (with number of edges $3058$), input dimension $C=1$, $\text{hidden\_dim}=32$, and increasing lengths $k=10, 100, 500, 1000, 5000$. GCN is a standard GCN with $\tanh$ without residual with $k$ layers. GCN (weight sharing) is the same, but just one layer iterated $k$ times. MP-SSM baselines use both 1 block.

Theorems & Definitions (19)

  • Remark 3.1
  • Definition 3.2: Local sensitivity
  • Remark 3.3
  • Theorem 3.4: Exact Jacobian computation in MP-SSM
  • Lemma 3.5: Powers of symmetrically normalized adjacency with self-loops
  • Theorem 3.6: Approximation deep regime
  • Corollary 3.7: Lower bound minimum sensitivity
  • Remark 3.8: Bottleneck Topologies
  • Definition 3.9: Global sensitivity
  • Remark 3.10
  • ...and 9 more