Table of Contents
Fetching ...

Revisiting Local PageRank Estimation on Undirected Graphs: Simple and Optimal

Hanzhi Wang

TL;DR

This work tackles the problem of locally estimating the PageRank score of a single target node in undirected graphs. It introduces BackMC, a simple Monte Carlo–style algorithm based on alpha-discounted random walks from the target, and proves a tight worst-case time bound of $O\left(\frac{1}{d_{\mathrm{min}}}\cdot \min\left(d_t, m^{1/2}\right)\right)$, along with a matching lower bound. The authors provide detailed analysis showing unbiasedness, variance control, and a median-based amplification to meet a specified failure probability, while maintaining an optimal overall runtime. Empirical results on real-world and synthetic graphs demonstrate BackMC's substantial gains in both efficiency and accuracy over prior methods, including SetPush. The work advances the theoretical understanding of local PageRank in undirected graphs and offers a practical tool for scalable graph analysis and downstream tasks like graph neural networks.

Abstract

We propose a simple and optimal algorithm, BackMC, for local PageRank estimation in undirected graphs: given an arbitrary target node $t$ in an undirected graph $G$ comprising $n$ nodes and $m$ edges, BackMC accurately estimates the PageRank score of node $t$ while assuring a small relative error and a high success probability. The worst-case computational complexity of BackMC is upper bounded by $O\left(\frac{1}{d_{\mathrm{min}}}\cdot \min\left(d_t, m^{1/2}\right)\right)$, where $d_{\mathrm{min}}$ denotes the minimum degree of $G$, and $d_t$ denotes the degree of $t$, respectively. Compared to the previously best upper bound of $ O\left(\log{n}\cdot \min\left(d_t, m^{1/2}\right)\right)$ (VLDB '23), which is derived from a significantly more complex algorithm and analysis, our BackMC improves the computational complexity for this problem by a factor of $Θ\left(\frac{\log{n}}{d_{\mathrm{min}}}\right)$ with a much simpler algorithm. Furthermore, we establish a matching lower bound of $Ω\left(\frac{1}{d_{\mathrm{min}}}\cdot \min\left(d_t, m^{1/2}\right)\right)$ for any algorithm that attempts to solve the problem of local PageRank estimation, demonstrating the theoretical optimality of our BackMC. We conduct extensive experiments on various large-scale real-world and synthetic graphs, where BackMC consistently shows superior performance.

Revisiting Local PageRank Estimation on Undirected Graphs: Simple and Optimal

TL;DR

This work tackles the problem of locally estimating the PageRank score of a single target node in undirected graphs. It introduces BackMC, a simple Monte Carlo–style algorithm based on alpha-discounted random walks from the target, and proves a tight worst-case time bound of , along with a matching lower bound. The authors provide detailed analysis showing unbiasedness, variance control, and a median-based amplification to meet a specified failure probability, while maintaining an optimal overall runtime. Empirical results on real-world and synthetic graphs demonstrate BackMC's substantial gains in both efficiency and accuracy over prior methods, including SetPush. The work advances the theoretical understanding of local PageRank in undirected graphs and offers a practical tool for scalable graph analysis and downstream tasks like graph neural networks.

Abstract

We propose a simple and optimal algorithm, BackMC, for local PageRank estimation in undirected graphs: given an arbitrary target node in an undirected graph comprising nodes and edges, BackMC accurately estimates the PageRank score of node while assuring a small relative error and a high success probability. The worst-case computational complexity of BackMC is upper bounded by , where denotes the minimum degree of , and denotes the degree of , respectively. Compared to the previously best upper bound of (VLDB '23), which is derived from a significantly more complex algorithm and analysis, our BackMC improves the computational complexity for this problem by a factor of with a much simpler algorithm. Furthermore, we establish a matching lower bound of for any algorithm that attempts to solve the problem of local PageRank estimation, demonstrating the theoretical optimality of our BackMC. We conduct extensive experiments on various large-scale real-world and synthetic graphs, where BackMC consistently shows superior performance.
Paper Structure (15 sections, 2 theorems, 12 equations, 5 figures, 3 tables, 2 algorithms)

This paper contains 15 sections, 2 theorems, 12 equations, 5 figures, 3 tables, 2 algorithms.

Key Result

Theorem 1

Given an undirected graph $G$ and a target node $t\in V$, the expected computational complexity of BackMC for computing a multiplicative $(1\pm c)$-approximation of $\bm{\pi}(t)$ with probability at least $1-p_f$ is $O\left(\frac{1}{d_{\mathrm{min}}}\cdot \min\left(d_t, m^{1/2}\right)\right)$.

Figures (5)

  • Figure 1: Hard instances of the lower bound proof.
  • Figure 2: actual Relative Error v.s. query time (seconds), the target node $\boldsymbol{t}$ sampled uniformly, $\alpha=0.2$
  • Figure 3: actual Relative Error v.s. query time (seconds), the target node $\boldsymbol{t}$ sampled from the degree distribution, $\alpha=0.2$
  • Figure 4: actual Relative Error v.s. query time (seconds), the target node $\boldsymbol{t}$ sampled uniformly, $\alpha=0.01$
  • Figure 5: actual Relative Error v.s. query time (seconds) on synthetic graphs, the target node $\boldsymbol{t}$ sampled uniformly, $\alpha=0.2$

Theorems & Definitions (2)

  • Theorem 1
  • Theorem 2