Table of Contents
Fetching ...

Engineering A Workload-balanced Push-Relabel Algorithm for Massive Graphs on GPUs

Chou-Ying Hsieh, Po-Chieh Lin, Sy-Yen Kuo

TL;DR

The paper tackles memory and workload-imbalance bottlenecks in GPU-accelerated push-relabel for massive graphs in maximum flow/minimum cut problems. It introduces two enhanced CSR formats, RCSR and BCSR, and a vertex-centric two-level parallelism driven by an active-vertex queue to balance work and improve locality. Memory usage is reduced from $O(V^2)$ to $O(V+E)$, with real-world graphs achieving up to about 7.31x speedups in maximum flow and 2.29x in bipartite matching. The results demonstrate scalable performance on modern GPUs and the authors commit to open-sourcing the implementation for broader adoption and further research.

Abstract

The push-relabel algorithm is an efficient algorithm that solves the maximum flow/ minimum cut problems of its affinity to parallelization. As the size of graphs grows exponentially, researchers have used Graphics Processing Units (GPUs) to accelerate the computation of the push-relabel algorithm further. However, prior works need to handle the significant memory consumption to represent a massive residual graph. In addition, the nature of their algorithms has inherently imbalanced workload distribution on GPUs. This paper first identifies the two challenges with the memory and computational models. Based on the analysis of these models, we propose a workload-balanced push-relabel algorithm (WBPR) with two enhanced compressed sparse representations (CSR) and a vertex-centric approach. The enhanced CSR significantly reduces memory consumption, while the vertex-centric approach alleviates the workload imbalance and improves the utilization of the GPU. In the experiment, our approach reduces the memory consumption from O(V^2) to O(V + E). Moreover, we can achieve up to 7.31x and 2.29x runtime speedup compared to the state-of-the-art on real-world graphs in maximum flow and bipartite matching tasks, respectively. Our code will be open-sourced for further research on accelerating the push-relabel algorithm.

Engineering A Workload-balanced Push-Relabel Algorithm for Massive Graphs on GPUs

TL;DR

The paper tackles memory and workload-imbalance bottlenecks in GPU-accelerated push-relabel for massive graphs in maximum flow/minimum cut problems. It introduces two enhanced CSR formats, RCSR and BCSR, and a vertex-centric two-level parallelism driven by an active-vertex queue to balance work and improve locality. Memory usage is reduced from to , with real-world graphs achieving up to about 7.31x speedups in maximum flow and 2.29x in bipartite matching. The results demonstrate scalable performance on modern GPUs and the authors commit to open-sourcing the implementation for broader adoption and further research.

Abstract

The push-relabel algorithm is an efficient algorithm that solves the maximum flow/ minimum cut problems of its affinity to parallelization. As the size of graphs grows exponentially, researchers have used Graphics Processing Units (GPUs) to accelerate the computation of the push-relabel algorithm further. However, prior works need to handle the significant memory consumption to represent a massive residual graph. In addition, the nature of their algorithms has inherently imbalanced workload distribution on GPUs. This paper first identifies the two challenges with the memory and computational models. Based on the analysis of these models, we propose a workload-balanced push-relabel algorithm (WBPR) with two enhanced compressed sparse representations (CSR) and a vertex-centric approach. The enhanced CSR significantly reduces memory consumption, while the vertex-centric approach alleviates the workload imbalance and improves the utilization of the GPU. In the experiment, our approach reduces the memory consumption from O(V^2) to O(V + E). Moreover, we can achieve up to 7.31x and 2.29x runtime speedup compared to the state-of-the-art on real-world graphs in maximum flow and bipartite matching tasks, respectively. Our code will be open-sourced for further research on accelerating the push-relabel algorithm.
Paper Structure (16 sections, 1 equation, 3 figures, 2 tables, 2 algorithms)

This paper contains 16 sections, 1 equation, 3 figures, 2 tables, 2 algorithms.

Figures (3)

  • Figure 1: The illustration of the push-relabel algorithm in both (a) thread-centric manner and (b) vertex-centric manner in an iteration. Both two manner approach check the active vertex first ($e(u), h(u)$), and then find the its minimum-height neighbor for push (updating $e(u)$ and $cf(u,v)$) or relabel (updating $h(u)$). The vertex-centric approach uses the AVQ to collect all active vertices, so that it can assign more threads (a tile) for finding a minimum-height neighbor.
  • Figure 2: (a) The example residual graph. (b) The original CSR. (c) The reversed CSR (RCSR). (d) The bidirectional CSR (BCSR). The blue block is the flow of backward edges; while the red color stands for all neighbors of vertex $2$ in the residual graph. The orange ones represent the edges to scan when finding the minimum-height neighbor of a given vertex $2$. The green part is the cost to find the backward flow with the given edge $(2, 4)$.
  • Figure 3: The workload distribution of the bipartite matching problem across 13 bipartite graphs.