Table of Contents
Fetching ...

An improvement of degree-based hashing (DBH) graph partition method, using a novel metric

Anna Mastikhina, Oleg Senkevich, Dmitry Sirotkin, Danila Demin, Stanislav Moiseev

TL;DR

The paper tackles scalable graph partitioning for distributed graph computing by introducing a novel metric, MSIDS, to capture inner-degree concentration, and by proposing DBH-X, an enhanced partitioner that uses a degree threshold and a spread technique. DBH-X extends the traditional Degree-Based Hashing (DBH) method, achieving improved replication factor and MSIDS, and balancing them for better end-to-end runtimes. The authors provide theoretical connections between RF and MSIDS, and demonstrate via GraphX-based experiments on large power-law graphs that DBH-X yields significant runtime accelerations for PageRank and Label Propagation compared to EdgePartition2D and vanilla DBH. These results highlight the practical value of combining degree-aware partitioning with MSIDS-oriented objectives for real-world large-scale graphs.

Abstract

This paper examines the graph partition problem and introduces a new metric, MSIDS (maximal sum of inner degrees squared). We establish its connection to the replication factor (RF) optimization, which has been the main focus of theoretical work in this field. Additionally, we propose a new partition algorithm, DBH-X, based on the DBH partitioner. We demonstrate that DBH-X significantly improves both the RF and MSIDS, compared to the baseline DBH algorithm. In addition, we provide test results that show the runtime acceleration of GraphX-based PageRank and Label propagation algorithms.

An improvement of degree-based hashing (DBH) graph partition method, using a novel metric

TL;DR

The paper tackles scalable graph partitioning for distributed graph computing by introducing a novel metric, MSIDS, to capture inner-degree concentration, and by proposing DBH-X, an enhanced partitioner that uses a degree threshold and a spread technique. DBH-X extends the traditional Degree-Based Hashing (DBH) method, achieving improved replication factor and MSIDS, and balancing them for better end-to-end runtimes. The authors provide theoretical connections between RF and MSIDS, and demonstrate via GraphX-based experiments on large power-law graphs that DBH-X yields significant runtime accelerations for PageRank and Label Propagation compared to EdgePartition2D and vanilla DBH. These results highlight the practical value of combining degree-aware partitioning with MSIDS-oriented objectives for real-world large-scale graphs.

Abstract

This paper examines the graph partition problem and introduces a new metric, MSIDS (maximal sum of inner degrees squared). We establish its connection to the replication factor (RF) optimization, which has been the main focus of theoretical work in this field. Additionally, we propose a new partition algorithm, DBH-X, based on the DBH partitioner. We demonstrate that DBH-X significantly improves both the RF and MSIDS, compared to the baseline DBH algorithm. In addition, we provide test results that show the runtime acceleration of GraphX-based PageRank and Label propagation algorithms.
Paper Structure (20 sections, 5 theorems, 38 equations, 5 figures, 9 tables)

This paper contains 20 sections, 5 theorems, 38 equations, 5 figures, 9 tables.

Key Result

Theorem 5.1

A randomized vertex-cut in $m$ partitions has the expected replication factor of

Figures (5)

  • Figure 1: Edge2D partition
  • Figure 2: DBH partition, red lines --- edges on the partition 0, yellow --- partition 1, blue --- partition 2
  • Figure 3: Using spread = 2
  • Figure 4: Pagerank algorithm runtime using Edge2D and DBH-X partitions for graph uk-2002
  • Figure 5: Label propagation algorithm runtime using Edge2D, DBH and DBH-X partitions for graph graph500-24

Theorems & Definitions (7)

  • Theorem 5.1
  • Theorem 5.2
  • Corollary 5.2.1
  • proof
  • Corollary 5.2.2
  • Theorem 5.3
  • proof