Table of Contents
Fetching ...

gLSTM: Mitigating Over-Squashing by Increasing Storage Capacity

Hugh Blayney, Álvaro Arroyo, Xiaowen Dong, Michael M. Bronstein

TL;DR

The paper addresses over-squashing in graph neural networks by distinguishing two failure modes: capacity over-squashing (storage saturation) and sensitivity over-squashing (low Jacobian sensitivity). It introduces Neighbor Associative Recall (NAR) to isolate capacity effects and proposes gLSTM, a memory-augmented GNN that incorporates associative memory via a matrix memory and xLSTM-inspired gating, enhanced by k-hop aggregation. The approach yields strong results on the capacity-focused NAR task and competitive performance on real-world long-range benchmarks (GPP, LRGB), while ablations reveal that capacity and sensitivity can be decoupled. This work provides a new architectural direction for extending storage capacity in graphs, enabling better handling of long-range dependencies in networked data.

Abstract

Graph Neural Networks (GNNs) leverage the graph structure to transmit information between nodes, typically through the message-passing mechanism. While these models have found a wide variety of applications, they are known to suffer from over-squashing, where information from a large receptive field of node representations is collapsed into a single fixed sized vector, resulting in an information bottleneck. In this paper, we re-examine the over-squashing phenomenon through the lens of model storage and retrieval capacity, which we define as the amount of information that can be stored in a node's representation for later use. We study some of the limitations of existing tasks used to measure over-squashing and introduce a new synthetic task to demonstrate that an information bottleneck can saturate this capacity. Furthermore, we adapt ideas from the sequence modeling literature on associative memories, fast weight programmers, and the xLSTM model to develop a novel GNN architecture with improved capacity. We demonstrate strong performance of this architecture both on our capacity synthetic task, as well as a range of real-world graph benchmarks.

gLSTM: Mitigating Over-Squashing by Increasing Storage Capacity

TL;DR

The paper addresses over-squashing in graph neural networks by distinguishing two failure modes: capacity over-squashing (storage saturation) and sensitivity over-squashing (low Jacobian sensitivity). It introduces Neighbor Associative Recall (NAR) to isolate capacity effects and proposes gLSTM, a memory-augmented GNN that incorporates associative memory via a matrix memory and xLSTM-inspired gating, enhanced by k-hop aggregation. The approach yields strong results on the capacity-focused NAR task and competitive performance on real-world long-range benchmarks (GPP, LRGB), while ablations reveal that capacity and sensitivity can be decoupled. This work provides a new architectural direction for extending storage capacity in graphs, enabling better handling of long-range dependencies in networked data.

Abstract

Graph Neural Networks (GNNs) leverage the graph structure to transmit information between nodes, typically through the message-passing mechanism. While these models have found a wide variety of applications, they are known to suffer from over-squashing, where information from a large receptive field of node representations is collapsed into a single fixed sized vector, resulting in an information bottleneck. In this paper, we re-examine the over-squashing phenomenon through the lens of model storage and retrieval capacity, which we define as the amount of information that can be stored in a node's representation for later use. We study some of the limitations of existing tasks used to measure over-squashing and introduce a new synthetic task to demonstrate that an information bottleneck can saturate this capacity. Furthermore, we adapt ideas from the sequence modeling literature on associative memories, fast weight programmers, and the xLSTM model to develop a novel GNN architecture with improved capacity. We demonstrate strong performance of this architecture both on our capacity synthetic task, as well as a range of real-world graph benchmarks.

Paper Structure

This paper contains 31 sections, 14 equations, 22 figures, 7 tables.

Figures (22)

  • Figure 1: Computational graphs. Left: RingTransfer Giovanni2023OnOI. Middle: Tree-NeighborsMatchAlon2020OnTB. Right: NAR, introduced in \ref{['sec:neighbor-recall-explanation']}. Nodes with informative features are green, background gray. Red node is trained to solve the task.
  • Figure 2: Log Jacobian norms. "Deep" graphs are binary trees of Tree-NeighborsMatchAlon2020OnTB; "Shallow" graphs are single-level trees with the same number of leaves. A GCN of depth equal to the tree depth acts on each. Jacobian norms are $|\partial {\bm{h}}_r^{(L)}/\partial {\bm{h}}^{(0)}_l|_1$ for root $r$ and leaf $l$ (red/green in \ref{['fig:comp_graph']}). Shaded area is standard deviation.
  • Figure 3: An example graph with $N=5$ from the NAR task. Key-value nodes are shown in blue, the central node in red and the query node in green. In this graph, $m$ is the randomly sampled index of the key-value node associated with query node $q$. The target for this graph is a one-hot vector corresponding to $v_m$.
  • Figure 4: gLSTM block structure. Gates shown in orange, query/key/value in dark blue. Aggr. represents aggregation across neighborhoods. Symbols $\odot, \otimes, +, \cdot$ denote Hadamard product, outer product, vector addition, matrix multiplication.
  • Figure 5: Test-set mean Accuracy (standard deviation shaded) for the NAR task, for gLSTM and GCN models with various hidden dimensions shown in \ref{['fig:nar_classification_mixed_aggregation_performance']}, number of trainable parameters in \ref{['fig:trainable_params']}. Note that gLSTM uses K-hop aggregation here, whereas GCN does not; see \ref{['appen:expanded_nar']} for separated performance by aggregation strategy.
  • ...and 17 more figures