An improvement of degree-based hashing (DBH) graph partition method, using a novel metric
Anna Mastikhina, Oleg Senkevich, Dmitry Sirotkin, Danila Demin, Stanislav Moiseev
TL;DR
The paper tackles scalable graph partitioning for distributed graph computing by introducing a novel metric, MSIDS, to capture inner-degree concentration, and by proposing DBH-X, an enhanced partitioner that uses a degree threshold and a spread technique. DBH-X extends the traditional Degree-Based Hashing (DBH) method, achieving improved replication factor and MSIDS, and balancing them for better end-to-end runtimes. The authors provide theoretical connections between RF and MSIDS, and demonstrate via GraphX-based experiments on large power-law graphs that DBH-X yields significant runtime accelerations for PageRank and Label Propagation compared to EdgePartition2D and vanilla DBH. These results highlight the practical value of combining degree-aware partitioning with MSIDS-oriented objectives for real-world large-scale graphs.
Abstract
This paper examines the graph partition problem and introduces a new metric, MSIDS (maximal sum of inner degrees squared). We establish its connection to the replication factor (RF) optimization, which has been the main focus of theoretical work in this field. Additionally, we propose a new partition algorithm, DBH-X, based on the DBH partitioner. We demonstrate that DBH-X significantly improves both the RF and MSIDS, compared to the baseline DBH algorithm. In addition, we provide test results that show the runtime acceleration of GraphX-based PageRank and Label propagation algorithms.
