Table of Contents
Fetching ...

Efficient GPU Implementation of Static and Incrementally Expanding DF-P PageRank for Dynamic Graphs

Subhajit Sahu

TL;DR

This work presents a highly efficient GPU implementation of Static PageRank and its dynamic extension, Dynamic Frontier with Pruning (DF-P), for dynamic graphs. The Static PageRank is realized via a synchronous, pull-based computation with two kernels that partition vertices by degree, eliminating dead-end teleport contributions and achieving strong GPU throughput. DF-P PageRank further extends this by incrementally expanding or pruning the set of potentially affected vertices, using additional partitioning and marking kernels to maintain near-linear scaling on updates. On NVIDIA A100 hardware, Static PageRank outperforms Hornet and Gunrock by up to 31× and 5.9×, respectively, while DF-P PageRank delivers 2.1×–3.1× improvements over Static depending on the workload, highlighting the practical impact for real-time dynamic graph analytics.

Abstract

PageRank is a widely used centrality measure that "ranks" vertices in a graph by considering the connections and their importance. In this report, we first introduce one of the most efficient GPU implementations of Static PageRank, which recomputes PageRank scores from scratch. It uses a synchronous pull-based atomics-free PageRank computation, with the low and high in-degree vertices being partitioned and processed by two separate kernels. Next, we present our GPU implementation of incrementally expanding (and contracting) Dynamic Frontier with Pruning (DF-P) PageRank, which processes only a subset of vertices likely to change ranks. It is based on Static PageRank, and uses an additional partitioning between low and high out-degree vertices for incremental expansion of the set of affected vertices with two additional kernels. On a server with an NVIDIA A100 GPU, our Static PageRank outperforms Hornet and Gunrock's PageRank implementations by 31x and 5.9x respectively. On top of the above, DF-P PageRank outperforms Static PageRank by 2.1x on real-world dynamic graphs, and by 3.1x on large static graphs with random batch updates.

Efficient GPU Implementation of Static and Incrementally Expanding DF-P PageRank for Dynamic Graphs

TL;DR

This work presents a highly efficient GPU implementation of Static PageRank and its dynamic extension, Dynamic Frontier with Pruning (DF-P), for dynamic graphs. The Static PageRank is realized via a synchronous, pull-based computation with two kernels that partition vertices by degree, eliminating dead-end teleport contributions and achieving strong GPU throughput. DF-P PageRank further extends this by incrementally expanding or pruning the set of potentially affected vertices, using additional partitioning and marking kernels to maintain near-linear scaling on updates. On NVIDIA A100 hardware, Static PageRank outperforms Hornet and Gunrock by up to 31× and 5.9×, respectively, while DF-P PageRank delivers 2.1×–3.1× improvements over Static depending on the workload, highlighting the practical impact for real-time dynamic graph analytics.

Abstract

PageRank is a widely used centrality measure that "ranks" vertices in a graph by considering the connections and their importance. In this report, we first introduce one of the most efficient GPU implementations of Static PageRank, which recomputes PageRank scores from scratch. It uses a synchronous pull-based atomics-free PageRank computation, with the low and high in-degree vertices being partitioned and processed by two separate kernels. Next, we present our GPU implementation of incrementally expanding (and contracting) Dynamic Frontier with Pruning (DF-P) PageRank, which processes only a subset of vertices likely to change ranks. It is based on Static PageRank, and uses an additional partitioning between low and high out-degree vertices for incremental expansion of the set of affected vertices with two additional kernels. On a server with an NVIDIA A100 GPU, our Static PageRank outperforms Hornet and Gunrock's PageRank implementations by 31x and 5.9x respectively. On top of the above, DF-P PageRank outperforms Static PageRank by 2.1x on real-world dynamic graphs, and by 3.1x on large static graphs with random batch updates.
Paper Structure (43 sections, 2 equations, 13 figures, 4 tables, 5 algorithms)

This paper contains 43 sections, 2 equations, 13 figures, 4 tables, 5 algorithms.

Figures (13)

  • Figure 1: Mean relative runtime with our Dynamic Frontier (DF) and Dynamic Frontier with Pruning (DF-P) approaches across three different levels of work-partitioning for GPU computation. Here, Partition $G$ denotes partitioning the vertices of the current graph $G$ by their out-degree, while Partition $G'$ signifies partitioning the vertices by their in-degree. Note that $G'$ stands for the transpose of the current graph $G$.
  • Figure 2: Runtime in seconds and speedup (log-scale) with Hornet, Gunrock, Our Static PageRank for each graph in the dataset.
  • Figure 3: Mean Runtime and Error in ranks obtained with our GPU implementation of Static, Naive-dynamic (ND), Dynamic Traversal (DT), Dynamic Frontier (DF), and Dynamic Frontier with Pruning (DF-P) PageRank on real-world dynamic graphs, with batch updates of size $10^{-5}|E_T|$ to $10^{-3}|E_T|$. Here, (a) and (b) show the overall runtime and error across all temporal graphs, while (c) and (d) show the runtime and rank error for each approach (relative to reference Static PageRank, see Section \ref{['sec:measurement']}). In (a), the speedup of each approach with respect to Static PageRank is labeled.
  • Figure 4: Runtime (logarithmic scale) of GPU implementation for Static, Naive-dynamic (ND), Dynamic Traversal (DT), Dynamic Frontier (DF), and Dynamic Frontier with Pruning (DF-P) PageRank on large (static) graphs with generated random batch updates. Batch updates range in size from $10^{-7}|E|$ to $0.1|E|$ in multiples of $10$. These updates consist of $80\%$ edge insertions and $20\%$ edge deletions, mimicking realistic changes in a dynamic graph scenario. The right subfigure illustrates the runtime of each approach for individual graphs in the dataset, while the left subfigure presents overall runtimes (using geometric mean for consistent scaling across graphs). Additionally, the speedup of each approach relative to Static PageRank is labeled.
  • Figure 5: Error comparison of our GPU implementation of Static, Naive-dynamic (ND), Dynamic Traversal (DT), Dynamic Frontier (DF), and Dynamic Frontier with Pruning (DF-P) PageRank on large (static) graphs with generated random batch updates, relative to a Reference Static PageRank (see Section \ref{['sec:measurement']}), using $L1$-norm. The size of batch updates range from $10^{-7} |E|$ to $0.1 |E|$ in multiples of $10$ (logarithmic scale), consisting of $80\%$ edge insertions and $20\%$ edge deletions to simulate realistic dynamic graph updates. The right subfigure depicts the error for each approach in relation to each graph, while the left subfigure showcases overall errors using geometric mean for consistent scaling across graphs.
  • ...and 8 more figures