Efficient Algorithms for Personalized PageRank Computation: A Survey

Mingji Yang; Hanzhi Wang; Zhewei Wei; Sibo Wang; Ji-Rong Wen

Efficient Algorithms for Personalized PageRank Computation: A Survey

Mingji Yang, Hanzhi Wang, Zhewei Wei, Sibo Wang, Ji-Rong Wen

TL;DR

Personalized PageRank (PPR) is a traditional measure for node proximity on large graphs that reflects the importance between <inline-formula><tex-math notation="LaTeX">$\boldsymbol{s}$</tex-math><alternatives><mml:math><mml:mi mathvariant="bold">s</mml:mi>π</mml:mi><mml:mi mathvariant="bold">s</mml:mi

Abstract

Personalized PageRank (PPR) is a traditional measure for node proximity on large graphs. For a pair of nodes $s$ and $t$, the PPR value $π_s(t)$ equals the probability that an $α$-discounted random walk from $s$ terminates at $t$ and reflects the importance between $s$ and $t$ in a bidirectional way. As a generalization of Google's celebrated PageRank centrality, PPR has been extensively studied and has found multifaceted applications in many fields, such as network analysis, graph mining, and graph machine learning. Despite numerous studies devoted to PPR over the decades, efficient computation of PPR remains a challenging problem, and there is a dearth of systematic summaries and comparisons of existing algorithms. In this paper, we recap several frequently used techniques for PPR computation and conduct a comprehensive survey of various recent PPR algorithms from an algorithmic perspective. We classify these approaches based on the types of queries they address and review their methodologies and contributions. We also discuss some representative algorithms for computing PPR on dynamic graphs and in parallel or distributed environments.

Efficient Algorithms for Personalized PageRank Computation: A Survey

TL;DR

Personalized PageRank (PPR) is a traditional measure for node proximity on large graphs that reflects the importance between <inline-formula><tex-math notation="LaTeX">

</tex-math><alternatives><mml:math><mml:mi mathvariant="bold">s</mml:mi>π</mml:mi><mml:mi mathvariant="bold">s</mml:mi

Abstract

Personalized PageRank (PPR) is a traditional measure for node proximity on large graphs. For a pair of nodes

and

, the PPR value

equals the probability that an

-discounted random walk from

terminates at

and reflects the importance between

and

in a bidirectional way. As a generalization of Google's celebrated PageRank centrality, PPR has been extensively studied and has found multifaceted applications in many fields, such as network analysis, graph mining, and graph machine learning. Despite numerous studies devoted to PPR over the decades, efficient computation of PPR remains a challenging problem, and there is a dearth of systematic summaries and comparisons of existing algorithms. In this paper, we recap several frequently used techniques for PPR computation and conduct a comprehensive survey of various recent PPR algorithms from an algorithmic perspective. We classify these approaches based on the types of queries they address and review their methodologies and contributions. We also discuss some representative algorithms for computing PPR on dynamic graphs and in parallel or distributed environments.

Paper Structure (33 sections, 3 theorems, 22 equations, 1 figure, 4 tables, 2 algorithms)

This paper contains 33 sections, 3 theorems, 22 equations, 1 figure, 4 tables, 2 algorithms.

Introduction
Preliminaries
Notations for Graphs
Definitions of PPR and PageRank
Definitions of PPR Queries
Basic Properties of PPR
Basic Techniques
The Monte Carlo Method
Power Iteration
Forward Push
Reverse Power Iteration
Backward Push
FP and BP on Undirected Graphs
Summary and Comparison of the Basic Techniques
Overview of PPR Algorithms
...and 18 more sections

Key Result

Theorem 1

For a preference vector $\boldsymbol{\sigma}$, we have $\boldsymbol{\pi}_{\boldsymbol{\sigma}}=\sum_{s\in V}\boldsymbol{\sigma}(s)\cdot\boldsymbol{\pi}_{s}$.

Figures (1)

Figure 1: A running example of Forward Push on a toy graph. $s$ is the source node, $\alpha$ is set to $0.2$ and $r_\mathrm{max}^{(\mathrm{f})}$ is set to $0.3$. Each step stands for a single push operation and updated information is marked in red.

Theorems & Definitions (4)

Definition 1: Probabilistic SSPPR Query with Relative Error Bounds
Theorem 1: The Linearity Theorem jeh2003scaling
Theorem 2: The Decomposition Theorem jeh2003scaling
Theorem 3: Symmetry of PPR on Undirected Graphs avrachenkov2013choice

Efficient Algorithms for Personalized PageRank Computation: A Survey

TL;DR

Abstract

Efficient Algorithms for Personalized PageRank Computation: A Survey

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (4)