A Sublinear Algorithm for Approximate Shortest Paths in Large Networks
Sabyasachi Basu, Nadia Kōshima, Talya Eden, Omri Ben-Eliezer, C. Seshadhri
TL;DR
This work introduces WormHole, a sublinear-index algorithm for approximate shortest-path queries in large networks. It leverages core-periphery structure by constructing a sublinear inner core and routing queries through this core to produce exact or near-exact paths with small additive error. Theoretical guarantees are proved under the Chung-Lu power-law model: the additive error is $O(\log\log n)$ and the per-inquiry cost is $n^{o(1)}$, while preprocessing is $o(n)$, contrasting with the $n^{\Omega(1)}$ costs of non-preprocessed methods. Empirical results across diverse real-world networks show fast setup (often minutes), low query cost (often a small fraction of the graph), and high accuracy (most queries with additive error $\le 2$), with variants trading accuracy for speed and enabling combination with full-index methods on the core.
Abstract
Computing distances and finding shortest paths in massive real-world networks is a fundamental algorithmic task in network analysis. There are two main approaches to solving this task. On one hand are traversal-based algorithms like bidirectional breadth-first search (BiBFS) with no preprocessing step and slow individual distance inquiries. On the other hand are indexing-based approaches, which maintain a large index. This allows for answering individual inquiries very fast; however, index creation is prohibitively expensive. We seek to bridge these two extremes: quickly answer distance inquiries without the need for costly preprocessing. In this work, we propose a new algorithm and data structure, WormHole, for approximate shortest path computations. WormHole leverages structural properties of social networks to build a sublinearly sized index, drawing upon the explicit core-periphery decomposition of Ben-Eliezer et al. Empirically, the preprocessing time of WormHole improves upon index-based solutions by orders of magnitude, and individual inquiries are consistently much faster than in BiBFS. The acceleration comes at the cost of a minor accuracy trade-off. Nonetheless, our empirical evidence demonstrates that WormHole accurately answers essentially all inquiries within a maximum additive error of 2. We complement these empirical results with provable theoretical guarantees, showing that WormHole requires $n^{o(1)}$ node queries per distance inquiry in random power-law networks. In contrast, any approach without a preprocessing step requires $n^{Ω(1)}$ queries for the same task. WormHole does not require reading the whole graph. Unlike the vast majority of index-based algorithms, it returns paths, not just distances. For faster inquiry times, it can be combined effectively with other index-based solutions, by running them only on the sublinear core.
