Table of Contents
Fetching ...

A Sublinear Algorithm for Approximate Shortest Paths in Large Networks

Sabyasachi Basu, Nadia Kōshima, Talya Eden, Omri Ben-Eliezer, C. Seshadhri

TL;DR

This work introduces WormHole, a sublinear-index algorithm for approximate shortest-path queries in large networks. It leverages core-periphery structure by constructing a sublinear inner core and routing queries through this core to produce exact or near-exact paths with small additive error. Theoretical guarantees are proved under the Chung-Lu power-law model: the additive error is $O(\log\log n)$ and the per-inquiry cost is $n^{o(1)}$, while preprocessing is $o(n)$, contrasting with the $n^{\Omega(1)}$ costs of non-preprocessed methods. Empirical results across diverse real-world networks show fast setup (often minutes), low query cost (often a small fraction of the graph), and high accuracy (most queries with additive error $\le 2$), with variants trading accuracy for speed and enabling combination with full-index methods on the core.

Abstract

Computing distances and finding shortest paths in massive real-world networks is a fundamental algorithmic task in network analysis. There are two main approaches to solving this task. On one hand are traversal-based algorithms like bidirectional breadth-first search (BiBFS) with no preprocessing step and slow individual distance inquiries. On the other hand are indexing-based approaches, which maintain a large index. This allows for answering individual inquiries very fast; however, index creation is prohibitively expensive. We seek to bridge these two extremes: quickly answer distance inquiries without the need for costly preprocessing. In this work, we propose a new algorithm and data structure, WormHole, for approximate shortest path computations. WormHole leverages structural properties of social networks to build a sublinearly sized index, drawing upon the explicit core-periphery decomposition of Ben-Eliezer et al. Empirically, the preprocessing time of WormHole improves upon index-based solutions by orders of magnitude, and individual inquiries are consistently much faster than in BiBFS. The acceleration comes at the cost of a minor accuracy trade-off. Nonetheless, our empirical evidence demonstrates that WormHole accurately answers essentially all inquiries within a maximum additive error of 2. We complement these empirical results with provable theoretical guarantees, showing that WormHole requires $n^{o(1)}$ node queries per distance inquiry in random power-law networks. In contrast, any approach without a preprocessing step requires $n^{Ω(1)}$ queries for the same task. WormHole does not require reading the whole graph. Unlike the vast majority of index-based algorithms, it returns paths, not just distances. For faster inquiry times, it can be combined effectively with other index-based solutions, by running them only on the sublinear core.

A Sublinear Algorithm for Approximate Shortest Paths in Large Networks

TL;DR

This work introduces WormHole, a sublinear-index algorithm for approximate shortest-path queries in large networks. It leverages core-periphery structure by constructing a sublinear inner core and routing queries through this core to produce exact or near-exact paths with small additive error. Theoretical guarantees are proved under the Chung-Lu power-law model: the additive error is and the per-inquiry cost is , while preprocessing is , contrasting with the costs of non-preprocessed methods. Empirical results across diverse real-world networks show fast setup (often minutes), low query cost (often a small fraction of the graph), and high accuracy (most queries with additive error ), with variants trading accuracy for speed and enabling combination with full-index methods on the core.

Abstract

Computing distances and finding shortest paths in massive real-world networks is a fundamental algorithmic task in network analysis. There are two main approaches to solving this task. On one hand are traversal-based algorithms like bidirectional breadth-first search (BiBFS) with no preprocessing step and slow individual distance inquiries. On the other hand are indexing-based approaches, which maintain a large index. This allows for answering individual inquiries very fast; however, index creation is prohibitively expensive. We seek to bridge these two extremes: quickly answer distance inquiries without the need for costly preprocessing. In this work, we propose a new algorithm and data structure, WormHole, for approximate shortest path computations. WormHole leverages structural properties of social networks to build a sublinearly sized index, drawing upon the explicit core-periphery decomposition of Ben-Eliezer et al. Empirically, the preprocessing time of WormHole improves upon index-based solutions by orders of magnitude, and individual inquiries are consistently much faster than in BiBFS. The acceleration comes at the cost of a minor accuracy trade-off. Nonetheless, our empirical evidence demonstrates that WormHole accurately answers essentially all inquiries within a maximum additive error of 2. We complement these empirical results with provable theoretical guarantees, showing that WormHole requires node queries per distance inquiry in random power-law networks. In contrast, any approach without a preprocessing step requires queries for the same task. WormHole does not require reading the whole graph. Unlike the vast majority of index-based algorithms, it returns paths, not just distances. For faster inquiry times, it can be combined effectively with other index-based solutions, by running them only on the sublinear core.
Paper Structure (25 sections, 4 theorems, 1 equation, 4 figures, 2 algorithms)

This paper contains 25 sections, 4 theorems, 1 equation, 4 figures, 2 algorithms.

Key Result

Theorem 4.1

Suppose a power law random graph with exponent $2<\beta<3$, average degree $d$ strictly greater than 1, and maximum degree $d_{max}>\log n/\log\log n$. Then almost surely the diameter is $\Theta(\log n)$, the diameter of the $\mathcal{C_{\textsf{CL}}}$ core is $O(\log \log n)$ and almost all vertice

Figures (4)

  • Figure 1: We illustrate the average running time per shortest path inquiry for three variants of WormHole, as compared to index-based (MLL MLL and PLL PLL), and traversal-based (BiBFS) competitors. PLL only finds distances, not paths. DNF marks that the preprocessing (index construction) step did not finish. All three of our variants outperformed BiBFS consistently. Index based solutions, on the other hand, generally failed on medium to large graphs as the index construction phase timed out. We note that even in smaller graphs where the index construction of MLL and PLL completed successfully, our fastest variant ${\texttt{WormHole}}{}_M$ has comparable per-inquiry running time.
  • Figure 2: (a) a comparison of the footprint in terms of disk space for different methods. The indexing based methods did not terminate on graphs larger than these. For WormHole, we consider the sum of $\mathcal{C}_{{\textsf{in}}}$ and $\mathcal{C}_{{\textsf{out}}}$ binary files. Note that PLL here is the distance algorithm, solving a weaker problem. The red bar "Input" is the size of the edge list. (b) we look at the number of vertices queried (visited) by BiBFS (dotted lines) and WormHole (solid lines) (the number is the same for all three variants). Observe that while BiBFS ends up seeing between 70% and 100% of the vertices in just a few hundred inquiries, we are well below 20% even after 5000 inquiries.
  • Figure 3: The decomposition and some representative cases where WormHole succeeds or fails. The central blue region is the inner ring $\mathcal{C}_{{\textsf{in}}}$, the green layer outside is the outer ring $\mathcal{C}_{{\textsf{out}}}$, and purple regions attached to this are the peripheral components forming $\mathcal{P}$. Dashed lines are edges. Vertices labelled $s$ and $t$ are respectively the source and destination. The green lines are actual shortest paths, while the black lines are paths output by WormHole. We ignore the case where both source vertices are in the same peripheral component. The first one (a) is the case where the shortest path and the path output by WormHole are identical; no error is incurred in this case. The second (b) is the case where the source and destination are in two different peripheral components, but they encounter a common vertex while traversing to the inner ring. The third (c) is an example of a case where we incur an error: the shortest path(s) interleaves through the outer ring $\mathcal{C}_{{\textsf{out}}}$, so by restricting the traversal solely to the interior of $\mathcal{C}_{{\textsf{in}}}$, we incur an error, in this case of $1$.
  • Figure 4: Construction of the inner and outer rings of the core; figure taken from BEOF22. At any point, the outer ring is the set of vertices adjacent to the inner ring. The algorithm expands the inner ring by adding to it a vertex from the outer ring that has the most neighbors in the inner ring. The image looks at two successive steps: the numbers labelling vertices in the outer ring refer to how many inner ring vertices it is adjacent to. Thus, in the second step, the vertex labelled 2 is added to the inner ring.

Theorems & Definitions (5)

  • Theorem 4.1: Theorem 4 in chung2004average
  • Claim 4.2: Fact 2 in chung2002connected
  • Theorem 4.3: WormHole inner ring $\supseteq$ Chung-Lu core
  • Theorem 4.4: Good additive error
  • Theorem 4.5: Subpolynomial query complexity for shortest paths