Table of Contents
Fetching ...

Simpler is Faster: Practical Distance Reporting by Sorting Along a Space-Filling Curve

Sarita de Berg, Emil Toftegaard Gæde, Ivor van der Hoog, Henrik Reinstädtler, Eva Rotenberg

TL;DR

This paper tackles distance reporting queries by proposing a minimal, practical technique: sort the input points along a space-filling curve and answer queries by scanning a small number of contiguous intervals. Grounded in a formal RSFC framework, the authors show that a query ball can be covered by at most 2^d hypercubes whose π-images are contiguous, enabling a simple, array-based index with dynamic support. They implement both Hilbert and Z-curves and compare against eight state-of-the-art range-searching structures across static and dynamic settings, real-world and synthetic data. The results demonstrate that the straightforward 200-line code based on SFC sorting matches or exceeds the performance of sophisticated structures in static scenarios and dominates in dynamic contexts, with early-termination enhancements further boosting performance for empty queries. Overall, the work challenges the necessity of complex hierarchical indexes for distance queries, highlighting that a well-chosen space-filling order can suffice for strong practical performance and simplicity.

Abstract

Range reporting is a classical problem in computational geometry. A (rectangular) reporting data structure stores a point set $P$, such that, given a (rectangular) query region $Δ$, it returns all points in $P \cap Δ$. A variety of data structures support such queries with differing asymptotic guarantees such as k-d trees, range trees, R-trees, and quadtrees. A common variant of range queries are distance reporting queries, where the input is a query point $q$ and a radius $δ$, and the goal is to report all points in $P$ within distance $δ$ of $q$. Such queries frequently arise as subroutines in geometric data structures. Practical implementations typically answer distance queries through rectangular range queries using the data structures listed before. This paper revisits a simple and practical heuristic for distance reporting, originally proposed in TCS'97: sort the input point set~$P$ along a space-filling curve. Queries then reduce to scanning at most four contiguous ranges along the sorted curve. The fact that sorting along a space-filling curve is beneficial for range reporting is well-known. Many implementations use this technique to speed up their query and construction times. The point that this paper makes is subtle, but interesting: we argue that often, it is the space-filling curve rather than the overall data structure that provides the performance benefits. Thus, we offer a simple but effective alternative: only sort $P$ along a space-filling curve instead. We compare this approach to eight range searching implementations, across an elaborate test suite of real-world and synthetic data. Our experiments confirm this simple 200-line code approach out-performs all high-end implementations in terms of space usage and construction time. It presents almost always the best query times. In a dynamic setting, our approach dominates in performance.

Simpler is Faster: Practical Distance Reporting by Sorting Along a Space-Filling Curve

TL;DR

This paper tackles distance reporting queries by proposing a minimal, practical technique: sort the input points along a space-filling curve and answer queries by scanning a small number of contiguous intervals. Grounded in a formal RSFC framework, the authors show that a query ball can be covered by at most 2^d hypercubes whose π-images are contiguous, enabling a simple, array-based index with dynamic support. They implement both Hilbert and Z-curves and compare against eight state-of-the-art range-searching structures across static and dynamic settings, real-world and synthetic data. The results demonstrate that the straightforward 200-line code based on SFC sorting matches or exceeds the performance of sophisticated structures in static scenarios and dominates in dynamic contexts, with early-termination enhancements further boosting performance for empty queries. Overall, the work challenges the necessity of complex hierarchical indexes for distance queries, highlighting that a well-chosen space-filling order can suffice for strong practical performance and simplicity.

Abstract

Range reporting is a classical problem in computational geometry. A (rectangular) reporting data structure stores a point set , such that, given a (rectangular) query region , it returns all points in . A variety of data structures support such queries with differing asymptotic guarantees such as k-d trees, range trees, R-trees, and quadtrees. A common variant of range queries are distance reporting queries, where the input is a query point and a radius , and the goal is to report all points in within distance of . Such queries frequently arise as subroutines in geometric data structures. Practical implementations typically answer distance queries through rectangular range queries using the data structures listed before. This paper revisits a simple and practical heuristic for distance reporting, originally proposed in TCS'97: sort the input point set~ along a space-filling curve. Queries then reduce to scanning at most four contiguous ranges along the sorted curve. The fact that sorting along a space-filling curve is beneficial for range reporting is well-known. Many implementations use this technique to speed up their query and construction times. The point that this paper makes is subtle, but interesting: we argue that often, it is the space-filling curve rather than the overall data structure that provides the performance benefits. Thus, we offer a simple but effective alternative: only sort along a space-filling curve instead. We compare this approach to eight range searching implementations, across an elaborate test suite of real-world and synthetic data. Our experiments confirm this simple 200-line code approach out-performs all high-end implementations in terms of space usage and construction time. It presents almost always the best query times. In a dynamic setting, our approach dominates in performance.

Paper Structure

This paper contains 43 sections, 2 theorems, 14 figures, 18 tables, 2 algorithms.

Key Result

Lemma 3

There is a one-to-one correspondence between the orderings $\sigma$ of $D$ from Definition def:old and the mappings $\pi$ of $D$ from Definition def:recursive_SFC.

Figures (14)

  • Figure 1: A Hilbert curve visits quadtree cells in a fixed order, ordering the input.
  • Figure 2: Time needed for constructing the data structure in seconds per point from $10^4$ to $10^9$.
  • Figure 3: Overall query time to answer $10^6$ uniform queries with size $\delta=0.001$ for $10^{4}$ to $10^9$.
  • Figure 4: Maximum resident memory per point for $10^6$ to $10^9$ points reported via /proc/. This measure includes the queries and points once. PAM requires more memory, see Table \ref{['tab:app:mem:scaling']}.
  • Figure 5: Plot of query time dependent on the relative length of the query window in seconds.
  • ...and 9 more figures

Theorems & Definitions (6)

  • Definition 1: RSFC in ASANO19973
  • Definition 2: Complete quadtree
  • Definition 3: RSFC in $\mathbb{Z}^d$
  • Lemma 3: Proof in the appendix
  • Definition 5
  • Lemma 5: Proof in the appendix