Estimating Hitting Times Locally At Scale
Themistoklis Haris, Fabian Spaeh, Spyros Dragazis, Charalampos Tsourakakis
TL;DR
This work tackles the challenge of estimating hitting times $H_G(u,v)$ in massive graphs using local access. It introduces two local estimators: a meeting-time based method analyzed via the Kronecker product and Markov-Chain concentration bounds, and a spectral-cutoff method that accounts for asymmetry through a carefully chosen cutoff $\ell$ tied to the spectral gap $\lambda$. The authors establish upper and lower bounds on sample complexity, connect hitting-time estimation to distribution testing, and validate the approaches on synthetic and real networks, showing favorable accuracy and scalability. The results enable scalable analyses of centrality, ranking, and epidemiology tasks that rely on hitting-time metrics, and lay groundwork for fully local, sublinear-time algorithms in network science.
Abstract
Hitting times provide a fundamental measure of distance in random processes, quantifying the expected number of steps for a random walk starting at node $u$ to reach node $v$. They have broad applications across domains such as network centrality analysis, ranking and recommendation systems, and epidemiology. In this work, we develop local algorithms for estimating hitting times between a pair of vertices $u,v$ without accessing the full graph, overcoming scalability issues of prior global methods. Our first algorithm uses the key insight that hitting time computations can be truncated at the meeting time of two independent random walks from $u$ and $v$. This leads to an efficient estimator analyzed via the Kronecker product graph and Markov Chain Chernoff bounds. We also present an algorithm extending the work of [Peng et al.; KDD 2021], that introduces a novel adaptation of the spectral cutoff technique to account for the asymmetry of hitting times. This adaptation captures the directionality of the underlying random walk and requires non-trivial modifications to ensure accuracy and efficiency. In addition to the algorithmic upper bounds, we also provide tight asymptotic lower bounds. We also reveal a connection between hitting time estimation and distribution testing, and validate our algorithms using experiments on both real and synthetic data.
