Table of Contents
Fetching ...

COMBA: Cross Batch Aggregation for Learning Large Graphs with Context Gating State Space Models

Jiajun Shen, Yufei Jin, Yi He, xingquan Zhu

TL;DR

COMBA tackles the challenge of learning long-range dependencies on large homogeneous graphs by embedding graph structure into a batched, hop-aware state-space framework. It integrates three components—hop-aware local context, cross-batch aggregation, and context gating—to enable scalable, linear-time–like learning while preserving global information. The authors provide a theoretical guarantee that cross-batch aggregation reduces approximation error relative to isolated batch training, and empirically show superior accuracy and robustness across six benchmark graphs, with favorable scalability. This approach offers a practical pathway to apply state-space models to graphs at scale, potentially informing future designs for efficient long-range graph modeling in real-world networks.

Abstract

State space models (SSMs) have recently emerged for modeling long-range dependency in sequence data, with much simplified computational costs than modern alternatives, such as transformers. Advancing SMMs to graph structured data, especially for large graphs, is a significant challenge because SSMs are sequence models and the shear graph volumes make it very expensive to convert graphs as sequences for effective learning. In this paper, we propose COMBA to tackle large graph learning using state space models, with two key innovations: graph context gating and cross batch aggregation. Graph context refers to different hops of neighborhood for each node, and graph context gating allows COMBA to use such context to learn best control of neighbor aggregation. For each graph context, COMBA samples nodes as batches, and train a graph neural network (GNN), with information being aggregated cross batches, allowing COMBA to scale to large graphs. Our theoretical study asserts that cross-batch aggregation guarantees lower error than training GNN without aggregation. Experiments on benchmark networks demonstrate significant performance gains compared to baseline approaches. Code and benchmark datasets will be released for public access.

COMBA: Cross Batch Aggregation for Learning Large Graphs with Context Gating State Space Models

TL;DR

COMBA tackles the challenge of learning long-range dependencies on large homogeneous graphs by embedding graph structure into a batched, hop-aware state-space framework. It integrates three components—hop-aware local context, cross-batch aggregation, and context gating—to enable scalable, linear-time–like learning while preserving global information. The authors provide a theoretical guarantee that cross-batch aggregation reduces approximation error relative to isolated batch training, and empirically show superior accuracy and robustness across six benchmark graphs, with favorable scalability. This approach offers a practical pathway to apply state-space models to graphs at scale, potentially informing future designs for efficient long-range graph modeling in real-world networks.

Abstract

State space models (SSMs) have recently emerged for modeling long-range dependency in sequence data, with much simplified computational costs than modern alternatives, such as transformers. Advancing SMMs to graph structured data, especially for large graphs, is a significant challenge because SSMs are sequence models and the shear graph volumes make it very expensive to convert graphs as sequences for effective learning. In this paper, we propose COMBA to tackle large graph learning using state space models, with two key innovations: graph context gating and cross batch aggregation. Graph context refers to different hops of neighborhood for each node, and graph context gating allows COMBA to use such context to learn best control of neighbor aggregation. For each graph context, COMBA samples nodes as batches, and train a graph neural network (GNN), with information being aggregated cross batches, allowing COMBA to scale to large graphs. Our theoretical study asserts that cross-batch aggregation guarantees lower error than training GNN without aggregation. Experiments on benchmark networks demonstrate significant performance gains compared to baseline approaches. Code and benchmark datasets will be released for public access.
Paper Structure (29 sections, 2 theorems, 39 equations, 4 figures, 3 tables, 3 algorithms)

This paper contains 29 sections, 2 theorems, 39 equations, 4 figures, 3 tables, 3 algorithms.

Key Result

Theorem 1

Denote the number of batches per group as $d$, with the set of batches $\mathcal{B} = \{B_{1}, \dots, B_{d}\}$. Let $\mathcal{BI}$ be the set of all node indices appearing in $\mathcal{B}$, and define the set of all seed node indices as where $s_i$ are the seed node indices in batch $B_i$. The complement of $\mathcal{S}$ with respect to $\mathcal{BI}$ is then $\mathcal{S}^{\mathsf{c}} = \mathcal{

Figures (4)

  • Figure 1: Cross batch aggregation process. From left to right, given a graph in ①, the nodes are partitioned into batches (only two batches are shown in ②). For each batch, COMBA first finds each nodes' $\hat{k}$-hop neighbors and forms a subgraph shown in ③ ($\hat{k}=2$ in this case). A GNN is trained for each 1-hope, 2-hop, and $k$-hop based adjacency matrix. When training each hop's GNN, information from other batches are used to help learn current batch node's embedding. E.g., in ④, node $a_2$ in Batch 1 aggregates information from $a_4$ from Batch 2. Cross batch aggregation allows all GNNs being trained to collectively help each others.
  • Figure 2: Illustration of the COMBA block with context gating. From left to right. The input sequence $\mathcal{Z'}$ is first processed by the S4 module to produce hop-wise representations. A context gating mechanism C is then applied over the local hop window $z_{k-w:k+w}$ to refine each hop embedding $y_k$. The gated outputs are concatenated with the original node features, forming a new sequence $Y$. Pooling along the hop dimension aggregates the sequence into embedding $X'$ for downstream tasks.
  • Figure 3: The proposed COMBA framework on large homogeneous graph. From left to right. Nodes of a homogeneous graph in ① are partitioned into $\hat{m}$ batches in ②. ③: for each batch, COMBA identifies the $\hat{k}$-hop neighbors of target nodes and constructs a corresponding subgraph. ④: Node embeddings will be updated across batches via cross batch aggregation as illustrated in Fig.\ref{['fig:crossbatch']}. The resulting sequence will pass into COMBA block with context gating ⑤ as illustrated in Fig.\ref{['fig:comba']}. ⑥: the final predictions $\hat{Y}$ for all nodes are obtained and optimized using the cross-entropy loss
  • Figure 4: Log average runtime per epoch ($y$-axis) using fixed batch sizes vs. the sum of number of nodes and number of edges in log scale ($x$-axis).

Theorems & Definitions (3)

  • Theorem 1
  • Theorem
  • Proof 1