Table of Contents
Fetching ...

Estimating Random-Walk Probabilities in Directed Graphs

Christian Bertram, Mads Vestergaard Jensen, Mikkel Thorup, Hanzhi Wang, Shuyi Yan

TL;DR

This work studies α-discounted random walks on directed graphs and the problem of estimating π(s,t), the Personalized PageRank score, across the single-pair, single-source, single-target, and single-node variants under various graph-access query models. It delivers a complete picture by deriving tight lower and upper bounds for all problem variants and query combinations, resolving polynomial gaps in both worst- and average-case settings. A key technical advance is a novel randomized bidirectional framework that combines backward propagation with selective Monte Carlo estimation, resolving the remaining query combination and achieving optimality up to polylog factors. The results clarify how different API query types (IN-SORTED, ADJ, JUMP) impact the complexity of PPR estimation, guiding the design of scalable graph APIs and impactfully informing large-scale graph analytics in practice.

Abstract

We study discounted random walks in directed graphs. In each step, the walk either terminates with a constant probability $α$, or proceeds to a random out-neighbor. Our goal is to estimate the probability $π(s, t)$ that a discounted random walk starting from $s$ terminates at $t$. This probability is also known as the Personalized PageRank (PPR) score, which measures the relevance of $t$ to $s$, for instance, when $s$ and $t$ are web pages on the Internet. We aim to estimate $π(s, t)$ within a constant relative error with constant probability. A variety of algorithms have been developed for several problem variants, such as single-pair, single-source, single-target, and single-node estimation, under both worst-case and average-case settings, and for different combinations of allowed graph queries. However, in many important cases, there remain polynomial gaps between known upper and lower bounds. In this paper, we establish tight bounds for all problem variants and query combinations, closing all existing gaps in both the worst-case and average-case settings. We provide tight (up to logarithmic factors) lower bounds, showing that for all but one query combination, existing algorithms are already optimal. For the remaining case, we design a novel algorithm that matches our new lower bound, thereby achieving optimality. This is the first algorithm to exploit this specific query combination. It uses a new randomized bidirectional framework that combines randomized backward propagation with selective Monte Carlo estimation.

Estimating Random-Walk Probabilities in Directed Graphs

TL;DR

This work studies α-discounted random walks on directed graphs and the problem of estimating π(s,t), the Personalized PageRank score, across the single-pair, single-source, single-target, and single-node variants under various graph-access query models. It delivers a complete picture by deriving tight lower and upper bounds for all problem variants and query combinations, resolving polynomial gaps in both worst- and average-case settings. A key technical advance is a novel randomized bidirectional framework that combines backward propagation with selective Monte Carlo estimation, resolving the remaining query combination and achieving optimality up to polylog factors. The results clarify how different API query types (IN-SORTED, ADJ, JUMP) impact the complexity of PPR estimation, guiding the design of scalable graph APIs and impactfully informing large-scale graph analytics in practice.

Abstract

We study discounted random walks in directed graphs. In each step, the walk either terminates with a constant probability , or proceeds to a random out-neighbor. Our goal is to estimate the probability that a discounted random walk starting from terminates at . This probability is also known as the Personalized PageRank (PPR) score, which measures the relevance of to , for instance, when and are web pages on the Internet. We aim to estimate within a constant relative error with constant probability. A variety of algorithms have been developed for several problem variants, such as single-pair, single-source, single-target, and single-node estimation, under both worst-case and average-case settings, and for different combinations of allowed graph queries. However, in many important cases, there remain polynomial gaps between known upper and lower bounds. In this paper, we establish tight bounds for all problem variants and query combinations, closing all existing gaps in both the worst-case and average-case settings. We provide tight (up to logarithmic factors) lower bounds, showing that for all but one query combination, existing algorithms are already optimal. For the remaining case, we design a novel algorithm that matches our new lower bound, thereby achieving optimality. This is the first algorithm to exploit this specific query combination. It uses a new randomized bidirectional framework that combines randomized backward propagation with selective Monte Carlo estimation.

Paper Structure

This paper contains 46 sections, 38 theorems, 46 equations, 12 figures, 2 tables, 6 algorithms.

Key Result

Lemma 1.1

For the single-pair and single-source problems, the average complexity over all possible sources is the same as the complexity for a given worst-case source. This is for asymptotic complexity in terms of $n$, $m$, and $\delta$, in the adjacency-list model with any subset of $\textnormal{JUMP}$, $\te

Figures (12)

  • Figure 1: Current best hard instance for the worst-case single-pair problem.
  • Figure 2: Hard instance for the worst-case single-pair problem. With the red edge pair, $s$ does not reach $t$, but with the blue edge pair, $s$ does reach $t$. An algorithm has to distinguish between these two cases, and because of the regular structure, this essentially means that it has to check a constant fraction of the edges.
  • Figure 3: Hard instance for the average-case single-pair problem. With the red edge pair, $s$ does not reach any $t \in W_2$, but with the blue edge pair, $s$ does reach every $t$ in the appropriate group of $W_2$. An algorithm has to distinguish between these two cases, and because of the regular structure, this essentially means that it has to check a constant fraction of the edges from the upper component or a constant fraction of the edges into the appropriate group of $V_2$.
  • Figure 4: Output-size lower bound constructions.
  • Figure 5: Hard instance for the worst-case single-target problem with $\textnormal{ADJ}$.
  • ...and 7 more figures

Theorems & Definitions (66)

  • Lemma 1.1
  • proof
  • Theorem 2.1: page1999pagerankMC
  • Theorem 2.2: BiPPRMCpage1999pagerank
  • Theorem 2.3
  • Lemma 2.4
  • proof
  • Lemma 2.5
  • proof
  • Lemma 2.6
  • ...and 56 more