Personalized PageRank Estimation in Undirected Graphs

Christian Bertram; Mads Vestergaard Jensen

Personalized PageRank Estimation in Undirected Graphs

Christian Bertram, Mads Vestergaard Jensen

TL;DR

This work provides a complete characterization of PPR estimation in undirected graphs by giving tight bounds (up to logarithmic factors) for all problems and model variants in both the worst-case and average-case setting.

Abstract

Given an undirected graph $G=(V, E)$, the Personalized PageRank (PPR) of $t\in V$ with respect to $s\in V$, denoted $π(s,t)$, is the probability that an $α$-discounted random walk starting at $s$ terminates at $t$. We study the time complexity of estimating $π(s,t)$ with constant relative error and constant failure probability, whenever $π(s,t)$ is above a given threshold parameter $δ\in(0,1)$. We consider common graph-access models and furthermore study the single source, single target, and single node (PageRank centrality) variants of the problem. We provide a complete characterization of PPR estimation in undirected graphs by giving tight bounds (up to logarithmic factors) for all problems and model variants in both the worst-case and average-case setting. This includes both new upper and lower bounds. Tight bounds were recently obtained by Bertram, Jensen, Thorup, Wang, and Yan for directed graphs. However, their lower bound constructions rely on asymmetry and therefore do not carry over to undirected graphs. At the same time, undirected graphs exhibit additional structure that can be exploited algorithmically. Our results resolve the undirected case by developing new techniques that capture both aspects, yielding tight bounds.

Personalized PageRank Estimation in Undirected Graphs

TL;DR

Abstract

Given an undirected graph

, the Personalized PageRank (PPR) of

with respect to

, denoted

, is the probability that an

-discounted random walk starting at

terminates at

. We study the time complexity of estimating

with constant relative error and constant failure probability, whenever

is above a given threshold parameter

. We consider common graph-access models and furthermore study the single source, single target, and single node (PageRank centrality) variants of the problem. We provide a complete characterization of PPR estimation in undirected graphs by giving tight bounds (up to logarithmic factors) for all problems and model variants in both the worst-case and average-case setting. This includes both new upper and lower bounds. Tight bounds were recently obtained by Bertram, Jensen, Thorup, Wang, and Yan for directed graphs. However, their lower bound constructions rely on asymmetry and therefore do not carry over to undirected graphs. At the same time, undirected graphs exhibit additional structure that can be exploited algorithmically. Our results resolve the undirected case by developing new techniques that capture both aspects, yielding tight bounds.

Paper Structure (38 sections, 36 theorems, 41 equations, 3 figures, 2 tables, 6 algorithms)

This paper contains 38 sections, 36 theorems, 41 equations, 3 figures, 2 tables, 6 algorithms.

Introduction
Our result: A complete characterization.
Lower bounds.
Upper bounds.
Estimation requirements.
Paper orginization.
Preliminaries
Pagerank and PPR
Monte Carlo Sampling
Local Push
Power Method
Randomized local push
Upper bounds
Single node
Overview.
...and 23 more sections

Key Result

Lemma 2.1.1

For any vertices $u,v\in V$, we have $\pi(u,v)d(u) = \pi(v,u)d(v)$.

Figures (3)

Figure 1: Hard instance for detecting a swap for $x=4$, $y=3$, and some $q=(q_A,q_B,q_C,q_D)$. With the red edge pair, $s$ does not reach $t$, but with the blue edge pair, $s$ reaches $t$. An algorithm has to distinguish between the two cases to satisfy the estimation requirements.
Figure 2: Hard instance for detecting a swap for $x=4$ and some $q=(s,q_B,t, q_C)$.
Figure 3: Hard instance for detecting a swap for $x=3$, and some $q=(b,q_C,t,q_D)$. Swapping the red edge pair with the blue edge pair, walks starting in $A$ cause a substantial increase in the PageRank of $t$. An algorithm has to distinguish between the two cases to satisfy the estimation requirement.

Theorems & Definitions (72)

Lemma 2.1.1
proof
Theorem 3.1.1
Lemma 3.1.2
proof
Lemma 3.1.3
proof
proof : Proof of \ref{['thm:sn-wc']}
Theorem 3.2.1
Lemma 3.2.2
...and 62 more

Personalized PageRank Estimation in Undirected Graphs

TL;DR

Abstract

Personalized PageRank Estimation in Undirected Graphs

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (72)