Buffered Streaming Edge Partitioning
Adil Chhabra, Marcelo Fonseca Faraj, Christian Schulz, Daniel Seemaier
TL;DR
This work tackles edge partitioning for massive graphs by introducing two buffered streaming algorithms, HeiStreamE and FreightE. HeiStreamE leverages a CSPAC-based batch model and a multilevel Fennel partitioner to achieve high-quality partitions with time and memory linear in the graph size and independent of the number of blocks $k$, while FreightE uses on-the-fly hypergraph partitioning to assign edges rapidly without CSPAC construction. The authors provide a detailed model construction, batch processing strategy, and three modes for connectivity-aware batch modeling, along with extensive parameter tuning and a comprehensive comparison against HDRF and 2PS variants. Empirical results show HeiStreamE generally outperforms competing streaming methods in replication factor and remains memory-efficient for real-world, edge-rich graphs, whereas FreightE delivers exceptionally fast partitioning, especially for large $k$, making the approaches practical for large-scale graph processing systems.
Abstract
Addressing the challenges of processing massive graphs, which are prevalent in diverse fields such as social, biological, and technical networks, we introduce HeiStreamE and FreightE, two innovative (buffered) streaming algorithms designed for efficient edge partitioning of large-scale graphs. HeiStreamE utilizes an adapted Split-and-Connect graph model and a Fennel-based multilevel partitioning scheme, while FreightE partitions a hypergraph representation of the input graph. Besides ensuring superior solution quality, these approaches also overcome the limitations of existing algorithms by maintaining linear dependency on the graph size in both time and memory complexity with no dependence on the number of blocks of partition. Our comprehensive experimental analysis demonstrates that HeiStreamE outperforms current streaming algorithms and the re-streaming algorithm 2PS in partitioning quality (replication factor), and is more memory-efficient for real-world networks where the number of edges is far greater than the number of vertices. Further, FreightE is shown to produce fast and efficient partitions, particularly for higher numbers of partition blocks.
