Table of Contents
Fetching ...

StarDist: A Code Generator for Distributed Graph Algorithms

Barenya Kumar Nandy, Rupesh Nasre

TL;DR

StarPlat introduces a distributed graph algorithm DSL with an MPI backend and an analysis-transformation framework that optimizes communication, neighborhood traversal, and reduction via a bulk-reduction substrate. It emphasizes reduction-exclusive statements, opportunistic caching, and cache-friendly synchronization to minimize RMA overhead. Empirical results on SSSP and CC show significant speedups over DRONE and Galois under distributed workloads, validating the approach. The work also outlines an extensible backend analyzer and future directions, including integration with graph partitioners like METIS for further scalability.

Abstract

Relational data, occurring in the real world, are often structured as graphs, which provide the logical abstraction required to make analytical derivations simpler. As graphs get larger, the irregular access patterns exhibited in most graph algorithms, hamper performance. This, along with NUMA and physical memory limits, results in scaling complexities with sequential/shared memory frameworks. StarPlat's MPI backend abstracts away the programmatic complexity involved in designing optimal distributed graph algorithms. It provides an instrument for coding graph algorithms that scale over distributed memory. In this work, we provide an analysis-transformation framework that leverages general semantics associated with iterations involving nodes and their neighbors, within StarPlat, to aggregate communication. The framework scans for patterns that warrant re-ordering in neighborhood access patterns, aggregate communication, and avoid communication altogether with opportunistic caching in reduction constructs. We also architect an optimized bulk-reduction substrate using Open MPI's passive Remote Memory Access (RMA) constructs. We applied our optimization logic to StarPlat's distributed backend and outperformed d-Galois by 2.05 and DRONE by 1.44 times in Single Source Shortest Paths across several big data graphs.

StarDist: A Code Generator for Distributed Graph Algorithms

TL;DR

StarPlat introduces a distributed graph algorithm DSL with an MPI backend and an analysis-transformation framework that optimizes communication, neighborhood traversal, and reduction via a bulk-reduction substrate. It emphasizes reduction-exclusive statements, opportunistic caching, and cache-friendly synchronization to minimize RMA overhead. Empirical results on SSSP and CC show significant speedups over DRONE and Galois under distributed workloads, validating the approach. The work also outlines an extensible backend analyzer and future directions, including integration with graph partitioners like METIS for further scalability.

Abstract

Relational data, occurring in the real world, are often structured as graphs, which provide the logical abstraction required to make analytical derivations simpler. As graphs get larger, the irregular access patterns exhibited in most graph algorithms, hamper performance. This, along with NUMA and physical memory limits, results in scaling complexities with sequential/shared memory frameworks. StarPlat's MPI backend abstracts away the programmatic complexity involved in designing optimal distributed graph algorithms. It provides an instrument for coding graph algorithms that scale over distributed memory. In this work, we provide an analysis-transformation framework that leverages general semantics associated with iterations involving nodes and their neighbors, within StarPlat, to aggregate communication. The framework scans for patterns that warrant re-ordering in neighborhood access patterns, aggregate communication, and avoid communication altogether with opportunistic caching in reduction constructs. We also architect an optimized bulk-reduction substrate using Open MPI's passive Remote Memory Access (RMA) constructs. We applied our optimization logic to StarPlat's distributed backend and outperformed d-Galois by 2.05 and DRONE by 1.44 times in Single Source Shortest Paths across several big data graphs.

Paper Structure

This paper contains 10 sections, 22 figures, 3 tables, 2 algorithms.

Figures (22)

  • Figure 1: Left: StarPlat Right: DSL code sample.
  • Figure 2: Background
  • Figure 3: Profiling observations
  • Figure 8: Communication Profile before and after
  • Figure 9: StarPlat MPI header structure before and after
  • ...and 17 more figures