Efficient GPU Implementation of Static and Incrementally Expanding DF-P PageRank for Dynamic Graphs
Subhajit Sahu
TL;DR
This work presents a highly efficient GPU implementation of Static PageRank and its dynamic extension, Dynamic Frontier with Pruning (DF-P), for dynamic graphs. The Static PageRank is realized via a synchronous, pull-based computation with two kernels that partition vertices by degree, eliminating dead-end teleport contributions and achieving strong GPU throughput. DF-P PageRank further extends this by incrementally expanding or pruning the set of potentially affected vertices, using additional partitioning and marking kernels to maintain near-linear scaling on updates. On NVIDIA A100 hardware, Static PageRank outperforms Hornet and Gunrock by up to 31× and 5.9×, respectively, while DF-P PageRank delivers 2.1×–3.1× improvements over Static depending on the workload, highlighting the practical impact for real-time dynamic graph analytics.
Abstract
PageRank is a widely used centrality measure that "ranks" vertices in a graph by considering the connections and their importance. In this report, we first introduce one of the most efficient GPU implementations of Static PageRank, which recomputes PageRank scores from scratch. It uses a synchronous pull-based atomics-free PageRank computation, with the low and high in-degree vertices being partitioned and processed by two separate kernels. Next, we present our GPU implementation of incrementally expanding (and contracting) Dynamic Frontier with Pruning (DF-P) PageRank, which processes only a subset of vertices likely to change ranks. It is based on Static PageRank, and uses an additional partitioning between low and high out-degree vertices for incremental expansion of the set of affected vertices with two additional kernels. On a server with an NVIDIA A100 GPU, our Static PageRank outperforms Hornet and Gunrock's PageRank implementations by 31x and 5.9x respectively. On top of the above, DF-P PageRank outperforms Static PageRank by 2.1x on real-world dynamic graphs, and by 3.1x on large static graphs with random batch updates.
