Table of Contents
Fetching ...

Estimating Hitting Times Locally At Scale

Themistoklis Haris, Fabian Spaeh, Spyros Dragazis, Charalampos Tsourakakis

TL;DR

This work tackles the challenge of estimating hitting times $H_G(u,v)$ in massive graphs using local access. It introduces two local estimators: a meeting-time based method analyzed via the Kronecker product and Markov-Chain concentration bounds, and a spectral-cutoff method that accounts for asymmetry through a carefully chosen cutoff $\ell$ tied to the spectral gap $\lambda$. The authors establish upper and lower bounds on sample complexity, connect hitting-time estimation to distribution testing, and validate the approaches on synthetic and real networks, showing favorable accuracy and scalability. The results enable scalable analyses of centrality, ranking, and epidemiology tasks that rely on hitting-time metrics, and lay groundwork for fully local, sublinear-time algorithms in network science.

Abstract

Hitting times provide a fundamental measure of distance in random processes, quantifying the expected number of steps for a random walk starting at node $u$ to reach node $v$. They have broad applications across domains such as network centrality analysis, ranking and recommendation systems, and epidemiology. In this work, we develop local algorithms for estimating hitting times between a pair of vertices $u,v$ without accessing the full graph, overcoming scalability issues of prior global methods. Our first algorithm uses the key insight that hitting time computations can be truncated at the meeting time of two independent random walks from $u$ and $v$. This leads to an efficient estimator analyzed via the Kronecker product graph and Markov Chain Chernoff bounds. We also present an algorithm extending the work of [Peng et al.; KDD 2021], that introduces a novel adaptation of the spectral cutoff technique to account for the asymmetry of hitting times. This adaptation captures the directionality of the underlying random walk and requires non-trivial modifications to ensure accuracy and efficiency. In addition to the algorithmic upper bounds, we also provide tight asymptotic lower bounds. We also reveal a connection between hitting time estimation and distribution testing, and validate our algorithms using experiments on both real and synthetic data.

Estimating Hitting Times Locally At Scale

TL;DR

This work tackles the challenge of estimating hitting times in massive graphs using local access. It introduces two local estimators: a meeting-time based method analyzed via the Kronecker product and Markov-Chain concentration bounds, and a spectral-cutoff method that accounts for asymmetry through a carefully chosen cutoff tied to the spectral gap . The authors establish upper and lower bounds on sample complexity, connect hitting-time estimation to distribution testing, and validate the approaches on synthetic and real networks, showing favorable accuracy and scalability. The results enable scalable analyses of centrality, ranking, and epidemiology tasks that rely on hitting-time metrics, and lay groundwork for fully local, sublinear-time algorithms in network science.

Abstract

Hitting times provide a fundamental measure of distance in random processes, quantifying the expected number of steps for a random walk starting at node to reach node . They have broad applications across domains such as network centrality analysis, ranking and recommendation systems, and epidemiology. In this work, we develop local algorithms for estimating hitting times between a pair of vertices without accessing the full graph, overcoming scalability issues of prior global methods. Our first algorithm uses the key insight that hitting time computations can be truncated at the meeting time of two independent random walks from and . This leads to an efficient estimator analyzed via the Kronecker product graph and Markov Chain Chernoff bounds. We also present an algorithm extending the work of [Peng et al.; KDD 2021], that introduces a novel adaptation of the spectral cutoff technique to account for the asymmetry of hitting times. This adaptation captures the directionality of the underlying random walk and requires non-trivial modifications to ensure accuracy and efficiency. In addition to the algorithmic upper bounds, we also provide tight asymptotic lower bounds. We also reveal a connection between hitting time estimation and distribution testing, and validate our algorithms using experiments on both real and synthetic data.

Paper Structure

This paper contains 27 sections, 28 theorems, 95 equations, 7 figures, 3 tables, 3 algorithms.

Key Result

Lemma 2.1

It is true that:

Figures (7)

  • Figure 1: A 'barbell"-like graph: a star, a path and a clique
  • Figure 2: Hitting time estimation on synthetic networks as a function of the number of nodes. Depicted are: Barabasi-Albert graphs (left), Erdos-Renyi graphs (middle) and Stochastic Block Model (SBM) Graphs (right).
  • Figure 3: A 'barbell"-like graph: a star, a path and a clique
  • Figure 4: Ablation study for the number of random walks in the Football and Facebook networks.
  • Figure 5: Correlation between the hitting time and the degree product $\deg(u) \cdot \deg(v)$ (top) and the product of pagerank centralities $\mathrm{pagerank}(u) \cdot \mathrm{pagerank}(v)$ (bottom) on the Football (left) and Facebook (right) networks.
  • ...and 2 more figures

Theorems & Definitions (56)

  • Definition 2.1: Mixing Time
  • Definition 2.2: Hitting Time
  • Lemma 2.1: cohen2016faster
  • Definition 2.3: Kronecker Product
  • Lemma 3.1
  • proof
  • Lemma 3.2
  • proof
  • Lemma 3.3
  • Lemma 3.4
  • ...and 46 more