Space-Efficient Approximate Spherical Range Counting in High Dimensions

Andreas Kalavas; Ioannis Psarros

Space-Efficient Approximate Spherical Range Counting in High Dimensions

Andreas Kalavas, Ioannis Psarros

Abstract

We study the following range searching problem in high-dimensional Euclidean spaces: given a finite set $P\subset \mathbb{R}^d$, where each $p\in P$ is assigned a weight $w_p$, and radius $r>0$, we need to preprocess $P$ into a data structure such that when a new query point $q\in \mathbb{R}^d$ arrives, the data structure reports the cumulative weight of points of $P$ within Euclidean distance $r$ from $q$. Solving the problem exactly seems to require space usage that is exponential to the dimension, a phenomenon known as the curse of dimensionality. Thus, we focus on approximate solutions where points up to $(1+\varepsilon)r$ away from $q$ may be taken into account, where $\varepsilon>0$ is an input parameter known during preprocessing. We build a data structure with near-linear space usage, and query time in $n^{1-Θ(\varepsilon^4/\log(1/\varepsilon))}+t_q^{\varrho}\cdot n^{1-\varrho}$, for some $\varrho=Θ(\varepsilon^2)$, where $t_q$ is the number of points of $P$ in the ambiguity zone, i.e., at distance between $r$ and $(1+\varepsilon)r$ from the query $q$. To the best of our knowledge, this is the first data structure with efficient space usage (subquadratic or near-linear for any $\varepsilon>0$) and query time that remains sublinear for any sublinear $t_q$. We supplement our worst-case bounds with a query-driven preprocessing algorithm to build data structures that are well-adapted to the query distribution.

Space-Efficient Approximate Spherical Range Counting in High Dimensions

Abstract

We study the following range searching problem in high-dimensional Euclidean spaces: given a finite set

, where each

is assigned a weight

, and radius

, we need to preprocess

into a data structure such that when a new query point

arrives, the data structure reports the cumulative weight of points of

within Euclidean distance

from

. Solving the problem exactly seems to require space usage that is exponential to the dimension, a phenomenon known as the curse of dimensionality. Thus, we focus on approximate solutions where points up to

away from

may be taken into account, where

is an input parameter known during preprocessing. We build a data structure with near-linear space usage, and query time in

, for some

, where

is the number of points of

in the ambiguity zone, i.e., at distance between

and

from the query

. To the best of our knowledge, this is the first data structure with efficient space usage (subquadratic or near-linear for any

) and query time that remains sublinear for any sublinear

. We supplement our worst-case bounds with a query-driven preprocessing algorithm to build data structures that are well-adapted to the query distribution.

Paper Structure (15 sections, 24 theorems, 6 equations, 1 table, 2 algorithms)

This paper contains 15 sections, 24 theorems, 6 equations, 1 table, 2 algorithms.

Preliminaries
Range spaces
Embeddings
Model
Approximate Stabbing Queries
Spanning Trees with low $\varepsilon$-stabbing number
Spherical Range Counting
Data-driven Range Searching
Supplementary Material for \ref{['sec:prel']}
Proof of \ref{['theo:etaepsilonapproxfunctionsrelative']}
Proof of \ref{['lemma:multisampling']}
Supplementary Material for \ref{['section:stabbing']}
Pseudocode of \ref{['thm:stabds']}
Supplementary Material for \ref{['sec:datadr']}
Proof of \ref{['lem:ubstabbing']}

Key Result

Theorem 1

Given a set of $n$ points in ${\mathbb R}^d$, we can build a randomized data structure for the approximate range counting problem, with space usage in $\tilde{{O}}(n)$, preprocessing time in $O(dn)+n^{\mathsf{poly}(1/\varepsilon)}$ and sublinear query time if $t_q$ is also sublinear.

Theorems & Definitions (24)

Theorem 1: Simplified version of \ref{['thm:final']}
Theorem 2: Simplified version of \ref{['thm:samplingcomplexity']}
Theorem 3: $\varepsilon$-net theorem HW87
Theorem 4: Restated from 10.5555/338219.338267
Corollary 4
Lemma 4
Theorem 5: IN07
Theorem 6: NN19
Lemma 6
Lemma 6
...and 14 more

Space-Efficient Approximate Spherical Range Counting in High Dimensions

Abstract

Space-Efficient Approximate Spherical Range Counting in High Dimensions

Authors

Abstract

Table of Contents

Key Result

Theorems & Definitions (24)