Engineering A Workload-balanced Push-Relabel Algorithm for Massive Graphs on GPUs
Chou-Ying Hsieh, Po-Chieh Lin, Sy-Yen Kuo
TL;DR
The paper tackles memory and workload-imbalance bottlenecks in GPU-accelerated push-relabel for massive graphs in maximum flow/minimum cut problems. It introduces two enhanced CSR formats, RCSR and BCSR, and a vertex-centric two-level parallelism driven by an active-vertex queue to balance work and improve locality. Memory usage is reduced from $O(V^2)$ to $O(V+E)$, with real-world graphs achieving up to about 7.31x speedups in maximum flow and 2.29x in bipartite matching. The results demonstrate scalable performance on modern GPUs and the authors commit to open-sourcing the implementation for broader adoption and further research.
Abstract
The push-relabel algorithm is an efficient algorithm that solves the maximum flow/ minimum cut problems of its affinity to parallelization. As the size of graphs grows exponentially, researchers have used Graphics Processing Units (GPUs) to accelerate the computation of the push-relabel algorithm further. However, prior works need to handle the significant memory consumption to represent a massive residual graph. In addition, the nature of their algorithms has inherently imbalanced workload distribution on GPUs. This paper first identifies the two challenges with the memory and computational models. Based on the analysis of these models, we propose a workload-balanced push-relabel algorithm (WBPR) with two enhanced compressed sparse representations (CSR) and a vertex-centric approach. The enhanced CSR significantly reduces memory consumption, while the vertex-centric approach alleviates the workload imbalance and improves the utilization of the GPU. In the experiment, our approach reduces the memory consumption from O(V^2) to O(V + E). Moreover, we can achieve up to 7.31x and 2.29x runtime speedup compared to the state-of-the-art on real-world graphs in maximum flow and bipartite matching tasks, respectively. Our code will be open-sourced for further research on accelerating the push-relabel algorithm.
