Table of Contents
Fetching ...

Massively Parallel Algorithms for Approximate Shortest Paths

Michal Dory, Shaked Matar

TL;DR

The paper addresses sublinear-round, approximate shortest-path computation in the MPC model for unweighted graphs by combining two core constructs: limited-scale hopsets for short distances and near-additive emulators for long distances. It delivers a near-linear-memory, randomized framework that achieves (1+ε)-approximate SSSP in poly(log log n) rounds and builds a distance oracle with (1+ε)(2k-1) guarantees, queryable in O(1) time. A unified general framework (sampling, edge selection) yields near-exact hopsets and emulators, with careful memory-management enabling sublinear MPC implementations and heterogeneous MPC settings (one near-linear machine plus sublinear peers). The approach improves over prior polylog-round or high-additive-error methods, and it supports APSP via a two-structure scheme (limited-scale distance sketches plus emulators) and flexible memory-speed tradeoffs via spanner-based optimizations. Overall, the work advances efficient, scalable distance computation in MPC, with practical implications for large-scale graph analytics in MapReduce-like environments.

Abstract

We present fast algorithms for approximate shortest paths in the massively parallel computation (MPC) model. We provide randomized algorithms that take $poly(\log{\log{n}})$ rounds in the near-linear memory MPC model. Our results are for unweighted undirected graphs with $n$ vertices and $m$ edges. Our first contribution is a $(1+ε)$-approximation algorithm for Single-Source Shortest Paths (SSSP) that takes $poly(\log{\log{n}})$ rounds in the near-linear MPC model, where the memory per machine is $\tilde{O}(n)$ and the total memory is $\tilde{O}(mn^ρ)$, where $ρ$ is a small constant. Our second contribution is a distance oracle that allows to approximate the distance between any pair of vertices. The distance oracle is constructed in $poly(\log{\log{n}})$ rounds and allows to query a $(1+ε)(2k-1)$-approximate distance between any pair of vertices $u$ and $v$ in $O(1)$ additional rounds. The algorithm is for the near-linear memory MPC model with total memory of size $\tilde{O}((m+n^{1+ρ})n^{1/k})$, where $ρ$ is a small constant. While our algorithms are for the near-linear MPC model, in fact they only use one machine with $\tilde{O}(n)$ memory, where the rest of machines can have sublinear memory of size $O(n^γ)$ for a small constant $γ< 1$. All previous algorithms for approximate shortest paths in the near-linear MPC model either required $Ω(\log{n})$ rounds or had an $Ω(\log{n})$ approximation. Our approach is based on fast construction of near-additive emulators, limited-scale hopsets and limited-scale distance sketches that are tailored for the MPC model. While our end-results are for the near-linear MPC model, many of the tools we construct such as hopsets and emulators are constructed in the more restricted sublinear MPC model.

Massively Parallel Algorithms for Approximate Shortest Paths

TL;DR

The paper addresses sublinear-round, approximate shortest-path computation in the MPC model for unweighted graphs by combining two core constructs: limited-scale hopsets for short distances and near-additive emulators for long distances. It delivers a near-linear-memory, randomized framework that achieves (1+ε)-approximate SSSP in poly(log log n) rounds and builds a distance oracle with (1+ε)(2k-1) guarantees, queryable in O(1) time. A unified general framework (sampling, edge selection) yields near-exact hopsets and emulators, with careful memory-management enabling sublinear MPC implementations and heterogeneous MPC settings (one near-linear machine plus sublinear peers). The approach improves over prior polylog-round or high-additive-error methods, and it supports APSP via a two-structure scheme (limited-scale distance sketches plus emulators) and flexible memory-speed tradeoffs via spanner-based optimizations. Overall, the work advances efficient, scalable distance computation in MPC, with practical implications for large-scale graph analytics in MapReduce-like environments.

Abstract

We present fast algorithms for approximate shortest paths in the massively parallel computation (MPC) model. We provide randomized algorithms that take rounds in the near-linear memory MPC model. Our results are for unweighted undirected graphs with vertices and edges. Our first contribution is a -approximation algorithm for Single-Source Shortest Paths (SSSP) that takes rounds in the near-linear MPC model, where the memory per machine is and the total memory is , where is a small constant. Our second contribution is a distance oracle that allows to approximate the distance between any pair of vertices. The distance oracle is constructed in rounds and allows to query a -approximate distance between any pair of vertices and in additional rounds. The algorithm is for the near-linear memory MPC model with total memory of size , where is a small constant. While our algorithms are for the near-linear MPC model, in fact they only use one machine with memory, where the rest of machines can have sublinear memory of size for a small constant . All previous algorithms for approximate shortest paths in the near-linear MPC model either required rounds or had an approximation. Our approach is based on fast construction of near-additive emulators, limited-scale hopsets and limited-scale distance sketches that are tailored for the MPC model. While our end-results are for the near-linear MPC model, many of the tools we construct such as hopsets and emulators are constructed in the more restricted sublinear MPC model.

Paper Structure

This paper contains 67 sections, 48 theorems, 95 equations, 2 figures, 6 algorithms.

Key Result

Theorem 1.2

Given an unweighted, undirected graph $G=(V,E)$ on $n$ vertices, a parameter $\epsilon <1$ and a constant $\rho \in[ 1/{\log {\log n}}, 1/2]$, there is a randomized algorithm that computes $(1+\epsilon)$-approximation for SSSP with high probability (w.h.p.). The algorithm works in the near-linear MP

Figures (2)

  • Figure 1: The path from $u$ to $v$ using the edges of the set $Q$. The solid straight line depicts the path $\pi(u,v)$ in the graph $G$. The dashed lines represent the partition of the subpaths $\pi(u,u')$ and $\pi(v',v)$ into segments of length at most $\frac{1}{2} \alpha\left({1}/{\epsilon}\right)^{i-1}$. The dotted lines represent the paths of at most $i$ edges in $Q$ from $u$ (and $v$) to $p_i(u')$ (and $p_i(v')$). The curved edge represents the edge $\{p_i(u'),p_i(v')\}$ in $Q$.
  • Figure 2: The $u-v$ path in $Q$. The solid line depicts the shortest path $\pi(u,v)$ in $G$. Dashed lines depicts the division of $\pi(u,v)$ into segments. Dotted lines depict the path we have between the endpoints of each segment by Corollary \ref{['coro Q']}.

Theorems & Definitions (73)

  • Theorem 1.2
  • Theorem 1.3
  • Lemma 3.1
  • Theorem 3.2
  • Lemma 3.3
  • Theorem 3.4
  • Lemma 3.5
  • proof
  • Lemma 3.6
  • proof
  • ...and 63 more