Table of Contents
Fetching ...

Space-Efficient Approximate Spherical Range Counting in High Dimensions

Andreas Kalavas, Ioannis Psarros

Abstract

We study the following range searching problem in high-dimensional Euclidean spaces: given a finite set $P\subset \mathbb{R}^d$, where each $p\in P$ is assigned a weight $w_p$, and radius $r>0$, we need to preprocess $P$ into a data structure such that when a new query point $q\in \mathbb{R}^d$ arrives, the data structure reports the cumulative weight of points of $P$ within Euclidean distance $r$ from $q$. Solving the problem exactly seems to require space usage that is exponential to the dimension, a phenomenon known as the curse of dimensionality. Thus, we focus on approximate solutions where points up to $(1+\varepsilon)r$ away from $q$ may be taken into account, where $\varepsilon>0$ is an input parameter known during preprocessing. We build a data structure with near-linear space usage, and query time in $n^{1-Θ(\varepsilon^4/\log(1/\varepsilon))}+t_q^{\varrho}\cdot n^{1-\varrho}$, for some $\varrho=Θ(\varepsilon^2)$, where $t_q$ is the number of points of $P$ in the ambiguity zone, i.e., at distance between $r$ and $(1+\varepsilon)r$ from the query $q$. To the best of our knowledge, this is the first data structure with efficient space usage (subquadratic or near-linear for any $\varepsilon>0$) and query time that remains sublinear for any sublinear $t_q$. We supplement our worst-case bounds with a query-driven preprocessing algorithm to build data structures that are well-adapted to the query distribution.

Space-Efficient Approximate Spherical Range Counting in High Dimensions

Abstract

We study the following range searching problem in high-dimensional Euclidean spaces: given a finite set , where each is assigned a weight , and radius , we need to preprocess into a data structure such that when a new query point arrives, the data structure reports the cumulative weight of points of within Euclidean distance from . Solving the problem exactly seems to require space usage that is exponential to the dimension, a phenomenon known as the curse of dimensionality. Thus, we focus on approximate solutions where points up to away from may be taken into account, where is an input parameter known during preprocessing. We build a data structure with near-linear space usage, and query time in , for some , where is the number of points of in the ambiguity zone, i.e., at distance between and from the query . To the best of our knowledge, this is the first data structure with efficient space usage (subquadratic or near-linear for any ) and query time that remains sublinear for any sublinear . We supplement our worst-case bounds with a query-driven preprocessing algorithm to build data structures that are well-adapted to the query distribution.
Paper Structure (15 sections, 24 theorems, 6 equations, 1 table, 2 algorithms)

This paper contains 15 sections, 24 theorems, 6 equations, 1 table, 2 algorithms.

Key Result

Theorem 1

Given a set of $n$ points in ${\mathbb R}^d$, we can build a randomized data structure for the approximate range counting problem, with space usage in $\tilde{{O}}(n)$, preprocessing time in $O(dn)+n^{\mathsf{poly}(1/\varepsilon)}$ and sublinear query time if $t_q$ is also sublinear.

Theorems & Definitions (24)

  • Theorem 1: Simplified version of \ref{['thm:final']}
  • Theorem 2: Simplified version of \ref{['thm:samplingcomplexity']}
  • Theorem 3: $\varepsilon$-net theorem HW87
  • Theorem 4: Restated from 10.5555/338219.338267
  • Corollary 4
  • Lemma 4
  • Theorem 5: IN07
  • Theorem 6: NN19
  • Lemma 6
  • Lemma 6
  • ...and 14 more