EdgeSketch: Efficient Analysis of Massive Graph Streams
Jakub Lemiesz, Dingqi Yang, Philippe Cudré-Mauroux
TL;DR
EdgeSketch introduces a compact, universal graph representation built in a single pass over a graph edge stream, enabling unbiased estimators for fundamental graph statistics and direct execution of algorithms on the sketch. It fuses ideas from ExpSketch, FastExpSketch, and NodeSketch to store per-node sketches (F and S arrays) that support set operations, degrees, edge counts, and subgraph queries, while handling directed/undirected graphs and even hypergraphs. The paper demonstrates practical benefits through two core applications—community detection via a sketch-based Louvain method and graph reconstruction—showing substantial memory savings and runtime improvements with controlled accuracy, underpinned by analytical variance bounds that decay as $O(1/m)$. The approach scales to massive graphs (e.g., SBM with $|E| \approx 2.85\times 10^8$ and real-world bipartite-to-unipartite graphs with billions of edges), enabling streaming analytics where access to the full graph is infeasible and enabling robust multi-task graph analysis from compact sketches.
Abstract
We introduce EdgeSketch, a compact graph representation for efficient analysis of massive graph streams. EdgeSketch provides unbiased estimators for key graph properties with controllable variance and supports implementing graph algorithms on the stored summary directly. It is constructed in a fully streaming manner, requiring a single pass over the edge stream, while offline analysis relies solely on the sketch. We evaluate the proposed approach on two representative applications: community detection via the Louvain method and graph reconstruction through node similarity estimation. Experiments demonstrate substantial memory savings and runtime improvements over both lossless representations and prior sketching approaches, while maintaining reliable accuracy.
