DAWN: Matrix Operation-Optimized Algorithm for Shortest Paths Problem on Unweighted Graphs
Yelai Feng, Huaixi Wang, Yining Zhu, Xiandong Liu, Hongyi Lu, Qing Liu
TL;DR
DAWN introduces a matrix operation-optimized algorithm for unweighted SSSP and APSP that leverages Boolean vector-matrix operations, specifically BOVM and SOVM, to dramatically reduce computation and memory compared with traditional BFS and prior matrix-based methods. It achieves $O(E_{wcc}(i))$ time for SSSP and $O(S_{wcc} \cdot E_{wcc})$ time for APSP with $O(m)$ space, and demonstrates strong empirical gains over GAP ($3.769\times$) and Gunrock ($9.410\times$) across a diverse benchmark set, while maintaining lower GPU memory usage. The approach emphasizes high parallelism and scalability, validated on GPUs and CPUs across large graphs, and shows particular strength on sparse, large weakly connected components. The work suggests broad practical impact for fast graph analytics and lays groundwork for extending to weighted graphs using min-plus operations.
Abstract
The shortest paths problem is a fundamental challenge in graph theory, with a broad range of potential applications. The algorithms based on matrix multiplication exhibits excellent parallelism and scalability, but is constrained by high memory consumption and algorithmic complexity. Traditional shortest paths algorithms are limited by priority queues, such as BFS and Dijkstra algorithm, making the improvement of their parallelism a focal issue. We propose a matrix operation-optimized algorithm, which offers improved parallelism, reduced time complexity, and lower memory consumption. The novel algorithm requires $O(E_{wcc}(i))$ and $O(S_{wcc} \cdot E_{wcc})$ times for single-source and all-pairs shortest paths problems, respectively, where $S_{wcc}$ and $E_{wcc}$ denote the number of nodes and edges included in the largest weakly connected component in graph. To evaluate the effectiveness of the novel algorithm, we tested it using graphs from SuiteSparse Matrix Collection and Gunrock benchmark dataset. Our algorithm outperformed the BFS implementations from Gunrock and GAP (the previous state-of-the-art solution), achieving an average speedup of 3.769$\times$ and 9.410$\times$, respectively.
