Table of Contents
Fetching ...

Play like a Vertex: A Stackelberg Game Approach for Streaming Graph Partitioning

Zezhong Ding, Yongan Xiang, Shangyou Wang, Xike Xie, S. Kevin Zhou

TL;DR

The paper tackles the challenge of partitioning huge graphs in streaming settings while controlling memory and communication costs. It introduces S5P, a skewness-aware approach that first clusters edges into head and tail groups, then allocates these clusters to partitions via a two-stage Stackelberg game, aided by Count-Min Sketch and parallelization to ensure efficiency. Theoretical analyses provide time, space, and game-theoretic guarantees, and extensive experiments on real and synthetic graphs show up to 81% lower communication and substantial runtime gains, with robust performance against graph skewness. Overall, S5P delivers significantly improved partition quality under load balance constraints and demonstrates practical impact for distributed graph processing systems like PowerGraph.

Abstract

In the realm of distributed systems tasked with managing and processing large-scale graph-structured data, optimizing graph partitioning stands as a pivotal challenge. The primary goal is to minimize communication overhead and runtime cost. However, alongside the computational complexity associated with optimal graph partitioning, a critical factor to consider is memory overhead. Real-world graphs often reach colossal sizes, making it impractical and economically unviable to load the entire graph into memory for partitioning. This is also a fundamental premise in distributed graph processing, where accommodating a graph with non-distributed systems is unattainable. Currently, existing streaming partitioning algorithms exhibit a skew-oblivious nature, yielding satisfactory partitioning results exclusively for specific graph types. In this paper, we propose a novel streaming partitioning algorithm, the Skewness-aware Vertex-cut Partitioner S5P, designed to leverage the skewness characteristics of real graphs for achieving high-quality partitioning. S5P offers high partitioning quality by segregating the graph's edge set into two subsets, head and tail sets. Following processing by a skewness-aware clustering algorithm, these two subsets subsequently undergo a Stackelberg graph game. Our extensive evaluations conducted on substantial real-world and synthetic graphs demonstrate that, in all instances, the partitioning quality of S5P surpasses that of existing streaming partitioning algorithms, operating within the same load balance constraints. For example, S5P can bring up to a 51% improvement in partitioning quality compared to the top partitioner among the baselines. Lastly, we showcase that the implementation of S5P results in up to an 81% reduction in communication cost and a 130% increase in runtime efficiency for distributed graph processing tasks on PowerGraph.

Play like a Vertex: A Stackelberg Game Approach for Streaming Graph Partitioning

TL;DR

The paper tackles the challenge of partitioning huge graphs in streaming settings while controlling memory and communication costs. It introduces S5P, a skewness-aware approach that first clusters edges into head and tail groups, then allocates these clusters to partitions via a two-stage Stackelberg game, aided by Count-Min Sketch and parallelization to ensure efficiency. Theoretical analyses provide time, space, and game-theoretic guarantees, and extensive experiments on real and synthetic graphs show up to 81% lower communication and substantial runtime gains, with robust performance against graph skewness. Overall, S5P delivers significantly improved partition quality under load balance constraints and demonstrates practical impact for distributed graph processing systems like PowerGraph.

Abstract

In the realm of distributed systems tasked with managing and processing large-scale graph-structured data, optimizing graph partitioning stands as a pivotal challenge. The primary goal is to minimize communication overhead and runtime cost. However, alongside the computational complexity associated with optimal graph partitioning, a critical factor to consider is memory overhead. Real-world graphs often reach colossal sizes, making it impractical and economically unviable to load the entire graph into memory for partitioning. This is also a fundamental premise in distributed graph processing, where accommodating a graph with non-distributed systems is unattainable. Currently, existing streaming partitioning algorithms exhibit a skew-oblivious nature, yielding satisfactory partitioning results exclusively for specific graph types. In this paper, we propose a novel streaming partitioning algorithm, the Skewness-aware Vertex-cut Partitioner S5P, designed to leverage the skewness characteristics of real graphs for achieving high-quality partitioning. S5P offers high partitioning quality by segregating the graph's edge set into two subsets, head and tail sets. Following processing by a skewness-aware clustering algorithm, these two subsets subsequently undergo a Stackelberg graph game. Our extensive evaluations conducted on substantial real-world and synthetic graphs demonstrate that, in all instances, the partitioning quality of S5P surpasses that of existing streaming partitioning algorithms, operating within the same load balance constraints. For example, S5P can bring up to a 51% improvement in partitioning quality compared to the top partitioner among the baselines. Lastly, we showcase that the implementation of S5P results in up to an 81% reduction in communication cost and a 130% increase in runtime efficiency for distributed graph processing tasks on PowerGraph.
Paper Structure (25 sections, 6 theorems, 11 equations, 13 figures, 5 tables, 3 algorithms)

This paper contains 25 sections, 6 theorems, 11 equations, 13 figures, 5 tables, 3 algorithms.

Key Result

Theorem 1

The relative load balance $\tau$ can be bounded by $\frac{kL}{|E|}$. If we want to ensure that the upper limit for relative load balance is $t$, we can set $maxLoad$ to $\frac{t|E|}{k}$.

Figures (13)

  • Figure 1: Distributions about Graph Skewness
  • Figure 2: Skewness-aware Vertex-cut Partitioner Framework
  • Figure 3: A Toy Graph with $12$ Vertices and $14$ Edges
  • Figure 4: Skewness-aware Streaming Graph Clustering ($k$=3)
  • Figure 5: Stackelberg Game-based Partitioning ($k$=$3$)
  • ...and 8 more figures

Theorems & Definitions (8)

  • Definition 1: Head and Tail Vertices/Edges/Clusters
  • Definition 2: Streaming Graph Clustering
  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Theorem 4
  • Theorem 5
  • Theorem 6