Table of Contents
Fetching ...

Parallel Contraction Hierarchies Can Be Efficient and Scalable

Zijin Wan, Xiaojun Dong, Letong Wang, Enzuo Zhu, Yan Gu, Yihan Sun

TL;DR

The paper addresses the bottleneck of preprocessing in Contraction Hierarchies for road networks by introducing SPoCH, a scalable parallel CH construction framework. SPoCH combines a LocalSearch step with memoization, batching of Witness Path Searches, and a lazy, lock-free overlay-update mechanism facilitated by phase-concurrent hash tables, achieving large speedups in CH construction while preserving competitive query performance. Across 16 graphs, SPoCH delivers 11–68× speedups over the best sequential baseline and 3.8–41× over the best parallel baseline, with self-relative speedups up to around 70× on 96 cores. The work demonstrates the practical impact of algorithmic redesign and parallel data structures for large-scale shortest-path preprocessing and opens avenues for applying these ideas to other distance queries and higher-degree graphs.

Abstract

Contraction Hierarchies (CH) (Geisberger et al., 2008) is one of the most widely used algorithms for shortest-path queries on road networks. Compared to Dijkstra's algorithm, CH enables orders of magnitude faster query performance through a preprocessing phase, which iteratively categorizes vertices into hierarchies and adds shortcuts. However, constructing a CH is an expensive task. Existing solutions, including parallel ones, may suffer from long construction time. Especially, in our experiments, we observe that existing parallel solutions demonstrate unsatisfactory scalability, and have performance close to sequential algorithms. We present SPoCH (Scalable Parallelization of Contraction Hierarchies), an efficient and scalable CH construction algorithm in parallel. To address the challenges in previous work, our improvements focus on both redesigning the algorithm and leveraging parallel data structures. We compare SPoCH with the state-of-the-art sequential and parallel implementations on 16 graphs of various types. Our experiments show that SPoCH achieves speedups of 11 to 68 times over the best sequential baseline and 3.8 to 41 times over the best parallel baseline in CH construction, while maintaining competitive query performance and CH graph size. We have released our code and all datasets used in this paper.

Parallel Contraction Hierarchies Can Be Efficient and Scalable

TL;DR

The paper addresses the bottleneck of preprocessing in Contraction Hierarchies for road networks by introducing SPoCH, a scalable parallel CH construction framework. SPoCH combines a LocalSearch step with memoization, batching of Witness Path Searches, and a lazy, lock-free overlay-update mechanism facilitated by phase-concurrent hash tables, achieving large speedups in CH construction while preserving competitive query performance. Across 16 graphs, SPoCH delivers 11–68× speedups over the best sequential baseline and 3.8–41× over the best parallel baseline, with self-relative speedups up to around 70× on 96 cores. The work demonstrates the practical impact of algorithmic redesign and parallel data structures for large-scale shortest-path preprocessing and opens avenues for applying these ideas to other distance queries and higher-degree graphs.

Abstract

Contraction Hierarchies (CH) (Geisberger et al., 2008) is one of the most widely used algorithms for shortest-path queries on road networks. Compared to Dijkstra's algorithm, CH enables orders of magnitude faster query performance through a preprocessing phase, which iteratively categorizes vertices into hierarchies and adds shortcuts. However, constructing a CH is an expensive task. Existing solutions, including parallel ones, may suffer from long construction time. Especially, in our experiments, we observe that existing parallel solutions demonstrate unsatisfactory scalability, and have performance close to sequential algorithms. We present SPoCH (Scalable Parallelization of Contraction Hierarchies), an efficient and scalable CH construction algorithm in parallel. To address the challenges in previous work, our improvements focus on both redesigning the algorithm and leveraging parallel data structures. We compare SPoCH with the state-of-the-art sequential and parallel implementations on 16 graphs of various types. Our experiments show that SPoCH achieves speedups of 11 to 68 times over the best sequential baseline and 3.8 to 41 times over the best parallel baseline in CH construction, while maintaining competitive query performance and CH graph size. We have released our code and all datasets used in this paper.

Paper Structure

This paper contains 21 sections, 9 figures, 5 tables, 6 algorithms.

Figures (9)

  • Figure 1: Illustration of the construction of Contraction Hierarchies. For simplicity, we assume the input graph has unit weights, omitting the weight "1" from the graph.
  • Figure 2: An illustration of notations and benefit of batching in SPoCH. The two figures illustrate the vertex sets $\mathit{V_A}, \mathit{V_F},\mathit{V_S}$, and $\mathit{V_W}$. A node in both purple and green means it is in both $\mathit{V_W}$ and $\mathit{V_S}$. On the right, we show an illustration of using batching to save WPSes. To rescore the four vertices in $\mathit{V_A}$, for each $v\in \mathit{V_A}$, previous solutions will perform WPSes on all pairs $N_{\scriptsize \hbox{\it in}}(v)\times N_{\scriptsize \hbox{\it out}}(v)$. SPoCH identifies all possible WPS sources in the Contract step in the previous round and collect them in $\mathit{V_W}$. Therefore, the WPSes from the same source will be conducted only once. In this example, SPoCH only need 5 WPSes instead of 9.
  • Figure 3: An illustration of concurrent shortcut insertions in SPoCH. During the Contract step, for every feasible candidate $u \in \mathit{V_F}$, a dedicated thread inserts shortcuts between each pair $(v_1,v_2)$ with $v_1 \in N_{\scriptsize \hbox{\it in}}(u)$ and $v_2 \in N_{\scriptsize \hbox{\it out}}(u)$. In the example, three threads simultaneously attempt to insert edges $(D,B)$, $(D,F)$, $(D,G)$, which share the same source $D$, potentially creating a race condition.
  • Figure 4: Build time and query time of all tested baselines, normalized to our algorithm. Lower is better. Since all running times are normalized to our algorithm, "Ours" is always equal to one, represented by the horizontal red dotted line.
  • Figure 5: Heatmap of number of CH edges, query time, and query iteration. All numbers are normalized to ours. Bluer or smaller is better.
  • ...and 4 more figures