Graph Mamba: Towards Learning on Graphs with State Space Models
Ali Behrouz, Farnoosh Hashemi
TL;DR
GMNs address the quadratic cost of graph transformers and the limitations of message-passing by introducing Graph Mamba Networks, a framework based on selective State Space Models. The design follows a four-step (plus optional PE/SE) recipe: Neighborhood Tokenization, Token Ordering, Local Encoding, and a Bidirectional Selective SSM Encoder, with per-node complexity $O(M\,s\,(m+1))$ and total cost $O(M\,s\,(m+1)\,|V| + |E|)$. The authors establish universality results and demonstrate strong empirical performance across long-range, large-scale, and heterophilic graphs while using less memory than competitive baselines. The work shows that, with careful tokenization and selective SSMs, it is possible to achieve high performance without relying exclusively on attention-based transformers or heavy position/structure encodings.
Abstract
Graph Neural Networks (GNNs) have shown promising potential in graph representation learning. The majority of GNNs define a local message-passing mechanism, propagating information over the graph by stacking multiple layers. These methods, however, are known to suffer from two major limitations: over-squashing and poor capturing of long-range dependencies. Recently, Graph Transformers (GTs) emerged as a powerful alternative to Message-Passing Neural Networks (MPNNs). GTs, however, have quadratic computational cost, lack inductive biases on graph structures, and rely on complex Positional/Structural Encodings (SE/PE). In this paper, we show that while Transformers, complex message-passing, and SE/PE are sufficient for good performance in practice, neither is necessary. Motivated by the recent success of State Space Models (SSMs), such as Mamba, we present Graph Mamba Networks (GMNs), a general framework for a new class of GNNs based on selective SSMs. We discuss and categorize the new challenges when adapting SSMs to graph-structured data, and present four required and one optional steps to design GMNs, where we choose (1) Neighborhood Tokenization, (2) Token Ordering, (3) Architecture of Bidirectional Selective SSM Encoder, (4) Local Encoding, and dispensable (5) PE and SE. We further provide theoretical justification for the power of GMNs. Experiments demonstrate that despite much less computational cost, GMNs attain an outstanding performance in long-range, small-scale, large-scale, and heterophilic benchmark datasets.
