Table of Contents
Fetching ...

An Incrementally Expanding Approach for Updating PageRank on Dynamic Graphs

Subhajit Sahu

TL;DR

The Dynamic Frontier approach given a batch update of edge deletion and insertions, it progressively identifies affected vertices that are likely to change their ranks with minimal overhead and improves performance at an average rate of 1.8x for every doubling of threads.

Abstract

PageRank is a popular centrality metric that assigns importance to the vertices of a graph based on its neighbors and their score. Efficient parallel algorithms for updating PageRank on dynamic graphs is crucial for various applications, especially as dataset sizes have reached substantial scales. This technical report presents our Dynamic Frontier approach. Given a batch update of edge deletion and insertions, it progressively identifies affected vertices that are likely to change their ranks with minimal overhead. On a server equipped with a 64-core AMD EPYC-7742 processor, our Dynamic Frontier PageRank outperforms Static, Naive-dynamic, and Dynamic Traversal PageRank by 7.8x, 2.9x, and 3.9x respectively - on uniformly random batch updates of size 10^-7 |E| to 10^-3 |E|. In addition, our approach improves performance at an average rate of 1.8x for every doubling of threads.

An Incrementally Expanding Approach for Updating PageRank on Dynamic Graphs

TL;DR

The Dynamic Frontier approach given a batch update of edge deletion and insertions, it progressively identifies affected vertices that are likely to change their ranks with minimal overhead and improves performance at an average rate of 1.8x for every doubling of threads.

Abstract

PageRank is a popular centrality metric that assigns importance to the vertices of a graph based on its neighbors and their score. Efficient parallel algorithms for updating PageRank on dynamic graphs is crucial for various applications, especially as dataset sizes have reached substantial scales. This technical report presents our Dynamic Frontier approach. Given a batch update of edge deletion and insertions, it progressively identifies affected vertices that are likely to change their ranks with minimal overhead. On a server equipped with a 64-core AMD EPYC-7742 processor, our Dynamic Frontier PageRank outperforms Static, Naive-dynamic, and Dynamic Traversal PageRank by 7.8x, 2.9x, and 3.9x respectively - on uniformly random batch updates of size 10^-7 |E| to 10^-3 |E|. In addition, our approach improves performance at an average rate of 1.8x for every doubling of threads.
Paper Structure (34 sections, 1 equation, 14 figures, 1 table, 1 algorithm)

This paper contains 34 sections, 1 equation, 14 figures, 1 table, 1 algorithm.

Figures (14)

  • Figure 1: Illustration of the Dynamic Frontier approach through a specific example. The initial graph consists of $16$ vertices and $25$ edges. The graph is then updated with an edge insertion $(4, 12)$, and an edge deletion $(2, 1)$. Accordingly, the outgoing neighbors of vertices $4$ ($3$ and $12$) and $2$ ($1$, $4$, and $8$) are marked as affected (shown with yellow fill). When the ranks of these affected vertices are computed in the first iteration, it is found that change in rank of vertices $1$ and $12$ exceeds the frontier tolerance $\tau_f$ (shown with red border). Thus, outgoing neighbors of vertices $1$ ($3$ and $5$) and $12$ ($11$ and $14$) are also marked as affected. In the second iteration, the change in rank of vertices $3$, $5$, $11$, and $14$ is greater than $\tau_f$ --- thus their outgoing vertices are marked as affected. In the subsequent iteration, the ranks of affected vertices are again updated. If the change in rank of every vertex is within iteration tolerance $\tau$, the ranks of vertices have converged, and the algorithm terminates.
  • Figure 2: Average Relative runtime with asynchronous implementations of Static, Naive-dynamic, Dynamic Traversal, and Dynamic Frontier approach compared to their respective synchronous implementations, on batch updates of size $10^{-7}|E|$ to $0.1|E|$ (right), and overall (left). The results indicate that asynchronous implementations are faster than synchronous ones, especially for smaller batch sizes. This is due to a somewhat faster convergence and the absence of copy overhead (for Dynamic Traversal and Dynamic Frontier approaches).
  • Figure 3: Average Relative runtime and Error in ranks obtained (with respect to ranks obtained with Reference Static PageRank) using Dynamic Frontier approach, with frontier tolerance $\tau_f$ varying from $\tau$ to $\tau / 10^5$, on batch updates of size $10^{-7}|E|$ to $0.1|E|$. The figures indicate that increasing $\tau_f$ reduces runtime, but also increases the error. A Frontier tolerance $\tau_f$ of $\tau/10^4$ and $\tau/10^5$ obtain ranks with error lower than Static PageRank, and are thus acceptable (we choose $\tau_f = \tau/10^5$ to be on the safe side).
  • Figure 4: Runtime (logarithmic scale) for Static, Naive-dynamic, Dynamic Traversal, and Dynamic Frontier PageRank with batch updates exclusively comprising edge insertions, ranging from $10^{-7} |E|$ to $0.1 |E|$ in multiples of $10$ (logarithmic scale). The right figure details the runtime of each approach for individual graphs in the dataset, while the left figure displays overall runtimes --- using geometric mean for consistent scaling across graphs.
  • Figure 5: Speedup of Dynamic Frontier PageRank with respect to Static, Naive-dynamic, and Dynamic Traversal PageRank, on batch updates consisting solely of edge insertions ranging from $10^{-7} |E|$ to $0.1 |E|$ (logarithmic scale). The right figure depicts the speedup of Dynamic Frontier PageRank in relation to each approach for individual graphs in the dataset, while the left figure highlights the overall speedup.
  • ...and 9 more figures