Table of Contents
Fetching ...

BD-Index: Scalable Biharmonic Distance Queries on Large Graphs via Divide-and-Conquer Indexing

Yueyang Pan, Meihao Liao, Rong-Hua Li

TL;DR

This work tackles the challenge of exact single-pair biharmonic distance queries on large graphs by reframing BD as a distance between random-walk distributions and introducing a divide-and-conquer framework. The authors develop BD-Index, a hierarchical, bottom-up data structure that uses small cut sets to decompose BD into subgraph components and a compact set of contribution matrices, enabling exact queries with space $O(nh)$ and query time $O(nh)$. The approach shows order-of-magnitude speedups over state-of-the-art exact solvers and competitive performance against approximate methods, while maintaining numerical exactness up to floating-point precision. Case studies on road networks and GNN over-squashing demonstrate BD-Index’s practical utility for identifying critical links and guiding graph rewiring to improve learning outcomes.

Abstract

Biharmonic distance (\bd) is a powerful graph distance metric with many applications, including identifying critical links in road networks and mitigating over-squashing problem in \gnn. However, computing \bd\ is extremely difficult, especially on large graphs. In this paper, we focus on the problem of \emph{single-pair} \bd\ query. Existing methods mainly rely on random walk-based approaches, which work well on some graphs but become inefficient when the random walk cannot mix rapidly.To overcome this issue, we first show that the biharmonic distance between two nodes $s,t$, denoted by $b(s,t)$, can be interpreted as the distance between two random walk distributions starting from $s$ and $t$. To estimate these distributions, the required random walk length is large when the underlying graph can be easily cut into smaller pieces. Inspired by this observation, we present novel formulas of \bd to represent $b(s,t)$ by independent random walks within two node sets $\mathcal{V}_s$, $\mathcal{V}_t$ separated by a small \emph{cut set} $\mathcal{V}_{cut}$, where $\mathcal{V}_s\cup\mathcal{V}_t\cup\mathcal{V}_{cut}=\mathcal{V}$ is the set of graph nodes. Building upon this idea, we propose \bindex, a novel index structure which follows a divide-and-conquer strategy. The graph is first cut into pieces so that each part can be processed easily. Then, all the required random walk probabilities can be deterministically computed in a bottom-top manner. When a query comes, only a small part of the index needs to be accessed. We prove that \bindex\ requires $O(n\cdot h)$ space, can be built in $O(n\cdot h\cdot (h+d_{max}))$ time, and answers each query in $O(n\cdot h)$ time, where $h$ is the height of a hierarchy partition tree and $d_{max}$ is the maximum degree, which are both usually much smaller than $n$.

BD-Index: Scalable Biharmonic Distance Queries on Large Graphs via Divide-and-Conquer Indexing

TL;DR

This work tackles the challenge of exact single-pair biharmonic distance queries on large graphs by reframing BD as a distance between random-walk distributions and introducing a divide-and-conquer framework. The authors develop BD-Index, a hierarchical, bottom-up data structure that uses small cut sets to decompose BD into subgraph components and a compact set of contribution matrices, enabling exact queries with space and query time . The approach shows order-of-magnitude speedups over state-of-the-art exact solvers and competitive performance against approximate methods, while maintaining numerical exactness up to floating-point precision. Case studies on road networks and GNN over-squashing demonstrate BD-Index’s practical utility for identifying critical links and guiding graph rewiring to improve learning outcomes.

Abstract

Biharmonic distance (\bd) is a powerful graph distance metric with many applications, including identifying critical links in road networks and mitigating over-squashing problem in \gnn. However, computing \bd\ is extremely difficult, especially on large graphs. In this paper, we focus on the problem of \emph{single-pair} \bd\ query. Existing methods mainly rely on random walk-based approaches, which work well on some graphs but become inefficient when the random walk cannot mix rapidly.To overcome this issue, we first show that the biharmonic distance between two nodes , denoted by , can be interpreted as the distance between two random walk distributions starting from and . To estimate these distributions, the required random walk length is large when the underlying graph can be easily cut into smaller pieces. Inspired by this observation, we present novel formulas of \bd to represent by independent random walks within two node sets , separated by a small \emph{cut set} , where is the set of graph nodes. Building upon this idea, we propose \bindex, a novel index structure which follows a divide-and-conquer strategy. The graph is first cut into pieces so that each part can be processed easily. Then, all the required random walk probabilities can be deterministically computed in a bottom-top manner. When a query comes, only a small part of the index needs to be accessed. We prove that \bindex\ requires space, can be built in time, and answers each query in time, where is the height of a hierarchy partition tree and is the maximum degree, which are both usually much smaller than .

Paper Structure

This paper contains 21 sections, 9 theorems, 9 equations, 11 figures, 5 tables, 4 algorithms.

Key Result

lemma 1

Let $\tilde{\boldsymbol{\tau}}^{(\infty)}_{s}$ denote the degree-normalized distribution of the expected visit counts for an infinite random walk starting from node $s$. Then, the biharmonic distance satisfies

Figures (11)

  • Figure 1: Graph $\mathcal{G}$ and its Moore--Penrose pseudoinverse $\mathbf{L}^\dagger$. For example, $b(2,5) = \bigl\|L^{\dagger}\mathbf{e}_2 - L^{\dagger}\mathbf{e}_5\bigr\|_2^2 = 1.28$.
  • Figure 2: Illustration of the proposed formulas of $\mathsf{BD}$. (a) $\mathsf{BD}$ can be interpreted by random walks from $s$ and $t$ on the whole graph; (b) $\mathsf{BD}$ can be interpreted by random walks from $s$ and $t$ until hitting $v$; (c) If $v$ is a cut vertex, $\mathsf{BD}$ can be interpreted by random walks independently on $\mathcal{V}_s$ and $\mathcal{V}_t$; (d) $\mathsf{BD}$ can be interpreted by random walks independently on $\mathcal{V}_s$ and $\mathcal{V}_t$, separated by a small cut set $\mathcal{V}_{cut}=\{v_7,v_8\}$.
  • Figure 3: Illustration of $\mathsf{BD\textrm{-}\xspace Index}$'s index structure, index building process and query process
  • Figure 4: Query time compared with exact methods
  • Figure 5: Relative error of $\mathsf{BD\textrm{-}\xspace Index}$ and $\mathsf{LapSolver}$
  • ...and 6 more figures

Theorems & Definitions (9)

  • lemma 1: Global Random Walk Representation of $\mathsf{BD}$
  • lemma 2: $v$-absorbed Random Walk Representation of $\mathsf{BD}$
  • lemma 3: Cut-vertex Random Walk Representation of $\mathsf{BD}$
  • lemma 4: Cut-set Random Walk Representation of $\mathsf{BD}$
  • lemma 5: Compact Representation of the Contribution Matrices
  • lemma 6: Bottom-Up Aggregation
  • lemma 7: Space Complexity of the Index Structure
  • lemma 8: Time Complexity of Index Building
  • lemma 9: Time Complexity of Query Processing