Table of Contents
Fetching ...

A Query-Driven Approach to Space-Efficient Range Searching

Dimitris Fotakis, Andreas Kalavas, Ioannis Psarros

TL;DR

This work addresses range-searching data structures that adapt to an unknown query distribution by using a sampling oracle. It links partition-tree design to the stabbing/visiting framework and shows that near-linear query samples can yield near-optimal expected visiting numbers, via a spanning-tree/MST approach. It also introduces separator-based partition trees, notably ring separators, to achieve fast, query-driven performance for ball ranges in Euclidean spaces, with provable bounds on expected query time and linear space. The authors validate their methods experimentally, including neural-network-based node processing and separator heuristics, demonstrating practical improvements in query efficiency and robustness across distributions.

Abstract

We initiate a study of a query-driven approach to designing partition trees for range-searching problems. Our model assumes that a data structure is to be built for an unknown query distribution that we can access through a sampling oracle, and must be selected such that it optimizes a meaningful performance parameter on expectation. Our first contribution is to show that a near-linear sample of queries allows the construction of a partition tree with a near-optimal expected number of nodes visited during querying. We enhance this approach by treating node processing as a classification problem, leveraging fast classifiers like shallow neural networks to obtain experimentally efficient query times. Our second contribution is to develop partition trees using sparse geometric separators. Our preprocessing algorithm, based on a sample of queries, builds a balanced tree with nodes associated with separators that minimize query stabs on expectation; this yields both fast processing of each node and a small number of visited nodes, significantly reducing query time.

A Query-Driven Approach to Space-Efficient Range Searching

TL;DR

This work addresses range-searching data structures that adapt to an unknown query distribution by using a sampling oracle. It links partition-tree design to the stabbing/visiting framework and shows that near-linear query samples can yield near-optimal expected visiting numbers, via a spanning-tree/MST approach. It also introduces separator-based partition trees, notably ring separators, to achieve fast, query-driven performance for ball ranges in Euclidean spaces, with provable bounds on expected query time and linear space. The authors validate their methods experimentally, including neural-network-based node processing and separator heuristics, demonstrating practical improvements in query efficiency and robustness across distributions.

Abstract

We initiate a study of a query-driven approach to designing partition trees for range-searching problems. Our model assumes that a data structure is to be built for an unknown query distribution that we can access through a sampling oracle, and must be selected such that it optimizes a meaningful performance parameter on expectation. Our first contribution is to show that a near-linear sample of queries allows the construction of a partition tree with a near-optimal expected number of nodes visited during querying. We enhance this approach by treating node processing as a classification problem, leveraging fast classifiers like shallow neural networks to obtain experimentally efficient query times. Our second contribution is to develop partition trees using sparse geometric separators. Our preprocessing algorithm, based on a sample of queries, builds a balanced tree with nodes associated with separators that minimize query stabs on expectation; this yields both fast processing of each node and a small number of visited nodes, significantly reducing query time.

Paper Structure

This paper contains 19 sections, 16 theorems, 21 equations, 1 figure, 2 tables, 3 algorithms.

Key Result

Theorem 1.2

Given a set of $n$ points $P$, and $\tilde{O}(n)$ i.i.d. query ranges sampled from an unknown distribution ${\mathcal{D}}_Q$, we can build a partition tree $T$ on $P$ in $\tilde{O}(n^3)$ time, such that $T$ has an expected visiting number within $O(\log n)$ from the optimal for ${\mathcal{D}}_Q$. Th

Figures (1)

  • Figure 1: Average query times for partition trees (\ref{['tab:bptree_qtime']}) and ring trees (\ref{['qt_ring']}). For partition trees, times are per output point.

Theorems & Definitions (26)

  • Theorem 1.2: Informal version of \ref{['thm:samplingcomplexity']}
  • Theorem 1.3: Informal version of \ref{['T424']}
  • Theorem 2.1: 10.5555/338219.338267
  • Corollary 2.1
  • Lemma 3.0: Adapted from CW89
  • Corollary 3.1
  • Corollary 3.2
  • Lemma 3.2
  • Theorem 3.3
  • proof
  • ...and 16 more