Table of Contents
Fetching ...

Crane: An Accurate and Scalable Neural Sketch for Graph Stream Summarization

Boyan Wang, Zhuochen Fan, Dayu Wang, Fangcheng Fu, Zeyu Luan, Lei Zou, Qing Li, Tong Yang

TL;DR

Crane is proposed, a hierarchical neural sketch architecture for graph stream summarization that uses a hierarchical carry mechanism that automatically elevates frequent items to higher memory layers, reducing interference between frequent and infrequent items within the same layer.

Abstract

Graph streams are rapidly evolving sequences of edges that convey continuously changing relationships among entities, playing a crucial role in domains such as networking, finance, and cybersecurity. Their massive scale and high dynamism make obtaining accurate statistics challenging with limited memory constraints. Traditional methods summarize graph streams through hand-crafted sketches, while recent studies have begun to replace these sketches with neural counterparts to improve adaptability and accuracy. However, this shift faces a major challenge: under limited memory, dominant frequent items tend to overshadow rare ones, hindering the neural network's ability to recover accurate statistics. To address this, we propose Crane, a hierarchical neural sketch architecture for graph stream summarization. Crane uses a hierarchical carry mechanism that automatically elevates frequent items to higher memory layers, reducing interference between frequent and infrequent items within the same layer. To better accommodate real-world deployment, Crane further adopts an adaptive memory expansion strategy that dynamically adds new layers once the occupancy of the top layer exceeds a threshold, enabling scalability across diverse data magnitudes. Extensive experiments on various datasets ranging from 20K to 60M edges demonstrate that Crane reduces estimation error by roughly 10x compared to state-of-the-art methods.

Crane: An Accurate and Scalable Neural Sketch for Graph Stream Summarization

TL;DR

Crane is proposed, a hierarchical neural sketch architecture for graph stream summarization that uses a hierarchical carry mechanism that automatically elevates frequent items to higher memory layers, reducing interference between frequent and infrequent items within the same layer.

Abstract

Graph streams are rapidly evolving sequences of edges that convey continuously changing relationships among entities, playing a crucial role in domains such as networking, finance, and cybersecurity. Their massive scale and high dynamism make obtaining accurate statistics challenging with limited memory constraints. Traditional methods summarize graph streams through hand-crafted sketches, while recent studies have begun to replace these sketches with neural counterparts to improve adaptability and accuracy. However, this shift faces a major challenge: under limited memory, dominant frequent items tend to overshadow rare ones, hindering the neural network's ability to recover accurate statistics. To address this, we propose Crane, a hierarchical neural sketch architecture for graph stream summarization. Crane uses a hierarchical carry mechanism that automatically elevates frequent items to higher memory layers, reducing interference between frequent and infrequent items within the same layer. To better accommodate real-world deployment, Crane further adopts an adaptive memory expansion strategy that dynamically adds new layers once the occupancy of the top layer exceeds a threshold, enabling scalability across diverse data magnitudes. Extensive experiments on various datasets ranging from 20K to 60M edges demonstrate that Crane reduces estimation error by roughly 10x compared to state-of-the-art methods.
Paper Structure (46 sections, 6 theorems, 34 equations, 7 figures, 3 tables, 3 algorithms)

This paper contains 46 sections, 6 theorems, 34 equations, 7 figures, 3 tables, 3 algorithms.

Key Result

Lemma 1

Given a promotion threshold $\theta$, the number of layers $L$ required to accurately represent an edge with maximum frequency $F_{max}$ grows logarithmically, specifically $L \geq \log_\theta(F_{max} + 1)$. Consequently, the total space complexity relative to the stream volume is logarithmic.

Figures (7)

  • Figure 1: Overview of the motivating example.
  • Figure 2: The overall architecture of Crane.
  • Figure 3: Hierarchical Learnable Memory.
  • Figure 4: Hierarchical Carry Mechanism and Automatic Memory Expansion Strategy.
  • Figure 5: Robustness comparison.
  • ...and 2 more figures

Theorems & Definitions (6)

  • Lemma 1: Logarithmic Space Efficiency
  • Theorem 2: Logarithmic Amortized Time Complexity
  • Theorem 3: Exponential Decay of Collision Probability
  • Theorem 4: Residual Error Bound
  • Theorem 5: Variance Minimization via Linear Decoding
  • Theorem 6: Interference Isolation