High-Performance Parallelization of Dijkstra's Algorithm Using MPI and CUDA
Boyang Song
TL;DR
This work tackles accelerating shortest-path computation on large graphs by implementing and comparing three versions of Dijkstra's algorithm: a serial baseline, an MPI-based parallel version, and a CUDA-based parallel version, all using a common adjacency-matrix representation. The study reports substantial speedups with parallel approaches—approximately $5\times$ for MPI and over $10\times$ for CUDA relative to the serial implementation—while highlighting the persistent challenges of synchronization overhead and the memory costs of adjacency matrices in large-scale graphs. It systematically evaluates performance across diverse graph sizes and densities, noting that communication and load-balancing limitations constrain scalability in MPI, whereas GPU parallelism offers strong gains when data transfer and kernel efficiency are optimized. The results provide practical guidance for HPC implementations of parallel shortest-path computations and emphasize the trade-offs between CPU-based MPI and GPU-based CUDA for graph analytics.
Abstract
This paper investigates the parallelization of Dijkstra's algorithm for computing the shortest paths in large-scale graphs using MPI and CUDA. The primary hypothesis is that by leveraging parallel computing, the computation time can be significantly reduced compared to a serial implementation. To validate this, I implemented three versions of the algorithm: a serial version, an MPI-based parallel version, and a CUDA-based parallel version. Experimental results demonstrate that the MPI implementation achieves over 5x speedup, while the CUDA implementation attains more than 10x improvement relative to the serial benchmark. However, the study also reveals inherent challenges in parallelizing Dijkstra's algorithm, including its sequential logic and significant synchronization overhead. Furthermore, the use of an adjacency matrix as the data structure is examined, highlighting its impact on memory consumption and performance in both dense and sparse graphs.
