Table of Contents
Fetching ...

Scalable and Provable Kemeny Constant Computation on Static and Dynamic Graphs: A 2-Forest Sampling Approach

Cheng Li, Meihao Liao, Rong-Hua Li, Guoren Wang

TL;DR

This paper tackles computing the Kemeny constant $\kappa(G)$ on large and evolving graphs. It introduces a new forest-based formula that expresses KC via 2-forests and a path-mapped correspondence from uniform spanning trees to 2-forests, enabling unbiased, scalable estimation. The proposed Tree-To-Forest (TTF) algorithm samples uniform spanning trees with Wilson’s method and efficiently derives KC contributions using a DFS+BIT traversal, achieving near-linear time per sample. To handle dynamic graphs, two sample-maintenance strategies (BSM and ISM) preserve estimator correctness while dramatically reducing recomputation. Extensive experiments on 10 real-world datasets show that the approach outperforms state-of-the-art methods in static and dynamic settings, with strong theoretical guarantees and practical efficiency.

Abstract

Kemeny constant, defined as the expected hitting time of random walks from a source node to a randomly chosen target node, is a fundamental metric in graph data management with many real-world applications. However, computing it exactly on large graphs is highly challenging, as it requires inverting large graph matrices. Existing solutions mainly rely on approximate random-walk-based methods, which still need large sample sizes and lack strong theoretical guarantees. In this paper, we propose a new approach for approximating the Kemeny constant via 2-forest sampling. We first derive an unbiased estimator expressed through spanning trees by introducing a path mapping technique that establishes a direct correspondence between spanning trees and certain classes of 2-forests. Compared to random walk-based estimators, 2-forest-based estimators yield leads to a better theoretical bound. We further design efficient algorithms to sample and traverse spanning trees, leveraging data structures such as the Binary Indexed Tree (BIT) for optimization. Our theoretical analysis shows that the Kemeny constant can be approximated with relative error $ε$ in $O\left(\frac{Δ^2\bar{d}^2}{ε^2}(τ+ n\min(\log n, Δ))\right)$ time, where $τ$ is the tree-sampling time, $\bar{d}$ is the average degree, and $Δ$ is the graph diameter. This complexity is near-linear in practice. Moreover, existing methods largely target static graphs and lack efficient mechanisms for dynamic updates. To address this, we propose two sample maintenance strategies that partially update samples while preserving accuracy on dynamic graphs. Extensive experiments on 10 large real-world datasets demonstrate that our method consistently outperforms state-of-the-art approaches in both efficiency and accuracy on static and dynamic graphs.

Scalable and Provable Kemeny Constant Computation on Static and Dynamic Graphs: A 2-Forest Sampling Approach

TL;DR

This paper tackles computing the Kemeny constant $\kappa(G)$ on large and evolving graphs. It introduces a new forest-based formula that expresses KC via 2-forests and a path-mapped correspondence from uniform spanning trees to 2-forests, enabling unbiased, scalable estimation. The proposed Tree-To-Forest (TTF) algorithm samples uniform spanning trees with Wilson’s method and efficiently derives KC contributions using a DFS+BIT traversal, achieving near-linear time per sample. To handle dynamic graphs, two sample-maintenance strategies (BSM and ISM) preserve estimator correctness while dramatically reducing recomputation. Extensive experiments on 10 real-world datasets show that the approach outperforms state-of-the-art methods in static and dynamic settings, with strong theoretical guarantees and practical efficiency.

Abstract

Kemeny constant, defined as the expected hitting time of random walks from a source node to a randomly chosen target node, is a fundamental metric in graph data management with many real-world applications. However, computing it exactly on large graphs is highly challenging, as it requires inverting large graph matrices. Existing solutions mainly rely on approximate random-walk-based methods, which still need large sample sizes and lack strong theoretical guarantees. In this paper, we propose a new approach for approximating the Kemeny constant via 2-forest sampling. We first derive an unbiased estimator expressed through spanning trees by introducing a path mapping technique that establishes a direct correspondence between spanning trees and certain classes of 2-forests. Compared to random walk-based estimators, 2-forest-based estimators yield leads to a better theoretical bound. We further design efficient algorithms to sample and traverse spanning trees, leveraging data structures such as the Binary Indexed Tree (BIT) for optimization. Our theoretical analysis shows that the Kemeny constant can be approximated with relative error in time, where is the tree-sampling time, is the average degree, and is the graph diameter. This complexity is near-linear in practice. Moreover, existing methods largely target static graphs and lack efficient mechanisms for dynamic updates. To address this, we propose two sample maintenance strategies that partially update samples while preserving accuracy on dynamic graphs. Extensive experiments on 10 large real-world datasets demonstrate that our method consistently outperforms state-of-the-art approaches in both efficiency and accuracy on static and dynamic graphs.

Paper Structure

This paper contains 17 sections, 14 theorems, 16 equations, 21 figures, 2 tables, 3 algorithms.

Key Result

theorem 1

where $\Gamma$ denotes the set of all spanning trees of $G$, $r$ is an arbitrarily fixed node, $\mathrm{vol}(T_1) = \sum_{u\in T_1}d(u)$ represents the volume of the tree containing $r$, and $\mathbb{F}_{r\mid u}$ is the set of 2-forests in which $r$ and $u$ belong to different trees.

Figures (21)

  • Figure 1: An example graph, its spanning trees and 2-forests
  • Figure 3: Illustration of Transforming Spanning Trees to 2-Forests Using Path Mapping
  • Figure 4: Illustration of the DFS process
  • Figure 5: Illustration of Basic Samples Maintenance for Edge Insertion
  • Figure 6: Illustration of BSM for Edge Deletion
  • ...and 16 more figures

Theorems & Definitions (22)

  • theorem 1: Forest Formula of $\mathsf{KC}$
  • Example 1
  • definition 1: Path Mapping
  • Example 2
  • theorem 2
  • lemma 1
  • lemma 2
  • Example 3
  • theorem 3: Correctness of Naive Method
  • theorem 4: Correctness of Optimized Method
  • ...and 12 more