Simpler is Faster: Practical Distance Reporting by Sorting Along a Space-Filling Curve
Sarita de Berg, Emil Toftegaard Gæde, Ivor van der Hoog, Henrik Reinstädtler, Eva Rotenberg
TL;DR
This paper tackles distance reporting queries by proposing a minimal, practical technique: sort the input points along a space-filling curve and answer queries by scanning a small number of contiguous intervals. Grounded in a formal RSFC framework, the authors show that a query ball can be covered by at most 2^d hypercubes whose π-images are contiguous, enabling a simple, array-based index with dynamic support. They implement both Hilbert and Z-curves and compare against eight state-of-the-art range-searching structures across static and dynamic settings, real-world and synthetic data. The results demonstrate that the straightforward 200-line code based on SFC sorting matches or exceeds the performance of sophisticated structures in static scenarios and dominates in dynamic contexts, with early-termination enhancements further boosting performance for empty queries. Overall, the work challenges the necessity of complex hierarchical indexes for distance queries, highlighting that a well-chosen space-filling order can suffice for strong practical performance and simplicity.
Abstract
Range reporting is a classical problem in computational geometry. A (rectangular) reporting data structure stores a point set $P$, such that, given a (rectangular) query region $Δ$, it returns all points in $P \cap Δ$. A variety of data structures support such queries with differing asymptotic guarantees such as k-d trees, range trees, R-trees, and quadtrees. A common variant of range queries are distance reporting queries, where the input is a query point $q$ and a radius $δ$, and the goal is to report all points in $P$ within distance $δ$ of $q$. Such queries frequently arise as subroutines in geometric data structures. Practical implementations typically answer distance queries through rectangular range queries using the data structures listed before. This paper revisits a simple and practical heuristic for distance reporting, originally proposed in TCS'97: sort the input point set~$P$ along a space-filling curve. Queries then reduce to scanning at most four contiguous ranges along the sorted curve. The fact that sorting along a space-filling curve is beneficial for range reporting is well-known. Many implementations use this technique to speed up their query and construction times. The point that this paper makes is subtle, but interesting: we argue that often, it is the space-filling curve rather than the overall data structure that provides the performance benefits. Thus, we offer a simple but effective alternative: only sort $P$ along a space-filling curve instead. We compare this approach to eight range searching implementations, across an elaborate test suite of real-world and synthetic data. Our experiments confirm this simple 200-line code approach out-performs all high-end implementations in terms of space usage and construction time. It presents almost always the best query times. In a dynamic setting, our approach dominates in performance.
