Approximating Single-Source Personalized PageRank with Absolute Error Guarantees
Zhewei Wei, Ji-Rong Wen, Mingji Yang
TL;DR
This work studies the classic Single-Source PPR query, and proposes an algorithm that provides approximations with absolute error guarantees with high probability, achieving an expected complexity of $\widetilde{O}\left(\sqrt{\sum_{t\in V}\pi(s,t)/d(t)}\big/\varepsilon_d\right)$.
Abstract
Personalized PageRank (PPR) is an extensively studied and applied node proximity measure in graphs. For a pair of nodes $s$ and $t$ on a graph $G=(V,E)$, the PPR value $π(s,t)$ is defined as the probability that an $α$-discounted random walk from $s$ terminates at $t$, where the walk terminates with probability $α$ at each step. We study the classic Single-Source PPR query, which asks for PPR approximations from a given source node $s$ to all nodes in the graph. Specifically, we aim to provide approximations with absolute error guarantees, ensuring that the resultant PPR estimates $\hatπ(s,t)$ satisfy $\max_{t\in V}\big|\hatπ(s,t)-π(s,t)\big|\le\varepsilon$ for a given error bound $\varepsilon$. We propose an algorithm that achieves this with high probability, with an expected running time of - $\widetilde{O}\big(\sqrt{m}/\varepsilon\big)$ for directed graphs, where $m=|E|$; - $\widetilde{O}\big(\sqrt{d_{\mathrm{max}}}/\varepsilon\big)$ for undirected graphs, where $d_{\mathrm{max}}$ is the maximum node degree in the graph; - $\widetilde{O}\left(n^{γ-1/2}/\varepsilon\right)$ for power-law graphs, where $n=|V|$ and $γ\in\left(\frac{1}{2},1\right)$ is the extent of the power law. These sublinear bounds improve upon existing results. We also study the case when degree-normalized absolute error guarantees are desired, requiring $\max_{t\in V}\big|\hatπ(s,t)/d(t)-π(s,t)/d(t)\big|\le\varepsilon_d$ for a given error bound $\varepsilon_d$, where the graph is undirected and $d(t)$ is the degree of node $t$. We give an algorithm that provides this error guarantee with high probability, achieving an expected complexity of $\widetilde{O}\left(\sqrt{\sum_{t\in V}π(s,t)/d(t)}\big/\varepsilon_d\right)$. This improves over the previously known $O(1/\varepsilon_d)$ complexity.
