Balanced Partitioning for Optimizing Big Graph Computation: Complexities and Approximation Algorithms
Baoling Ning, Jianzhong Li
TL;DR
The paper addresses graph partitioning tailored to big-graph computation by introducing workload-driven (W$k$BGP) and motif-driven (M$k$BGP) objectives. It develops semidefinite programming representations to capture partitioning structure and applies sophisticated rounding to obtain bi-criteria $O(\sqrt{ ext{log} n ext{ log } k})$-approximation algorithms. It proves NP-hardness and inapproximability for motif-based partitioning (even with triangles) while delivering a tractable SDP-based approach for the triangle case and extending it to general motifs. Together, these results yield principled partitioning methods with guarantees that improve workload performance and motif computation on large-scale graphs.
Abstract
Graph partitioning is a key fundamental problem in the area of big graph computation. Previous works do not consider the practical requirements when optimizing the big data analysis in real applications. In this paper, motivated by optimizing the big data computing applications, two typical problems of graph partitioning are studied. The first problem is to optimize the performance of specific workloads by graph partitioning, which lacks of algorithms with performance guarantees. The second problem is to optimize the computation of motifs by graph partitioning, which has not been focused by previous works. First, the formal definitions of the above two problems are introduced, and the semidefinite programming representations are also designed based on the analysis of the properties of the two problems. For the motif based partitioning problem, it is proved to be NP-complete even for the special case of $k=2$ and the motif is a triangle, and its inapproximability is also shown by proving that there are no efficient algorithms with finite approximation ratio. Finally, using the semidefinite programming and sophisticated rounding techniques, the bi-criteria $O(\sqrt{\log n\log k})$-approximation algorithms with polynomial time cost are designed and analyzed for them.
