Range Counting Oracles for Geometric Problems
Anne Driemel, Morteza Monemizadeh, Eunjin Oh, Frank Staals, David P. Woodruff
TL;DR
Range Counting Oracles for Geometric Problems studies sublinear estimators for Earth Mover Distance (EMD) and Euclidean MST under a range-counting oracle on point sets in $[Δ]^d$. It develops dimension-aware sublinear algorithms for EMD, including a long/short edge decomposition in 1D and a quadtree-based extension to higher dimensions, achieving a relative error $O(\log Δ)$ and additive error $O(nΔ/s^{1+1/d})$ with sublinear query budgets; and an MST estimator achieving a $(1+ε)$-approximation with $O(\sqrt{n})$ queries. The work also introduces a uniform sampling primitive for non-empty grid cells and reduces MST estimation to a component-counting problem via a quadtree-based spanner and a well-separated pair decomposition (WSPD). It provides tight lower bounds, showing the proposed tradeoffs are essentially optimal up to polylog factors and dimension. These results enable efficient sublinear estimation of geometric costs in large-scale data systems where only range counts are accessible.
Abstract
In this paper, we study estimators for geometric optimization problems in the sublinear geometric model. In this model, we have oracle access to a point set with size $n$ in a discrete space $[Δ]^d$, where queries can be made to an oracle that responds to orthogonal range counting requests. The query complexity of an optimization problem is measured by the number of oracle queries required to compute an estimator for the problem. We investigate two problems in this framework, the Euclidean Minimum Spanning Tree (MST) and Earth Mover Distance (EMD). For EMD, we show the existence of an estimator that approximates the cost of EMD with $O(\log Δ)$-relative error and $O(\frac{nΔ}{s^{1+1/d}})$-additive error using $O(s\polylog Δ)$ range counting queries for any parameter $s$ with $1\leq s \leq n$. Moreover, we prove that this bound is tight. For MST, we demonstrate that the weight of MST can be estimated within a factor of $(1 \pm \eps)$ using $\tilde{O}(\sqrt{n})$ range counting queries.
