Table of Contents
Fetching ...

Range Counting Oracles for Geometric Problems

Anne Driemel, Morteza Monemizadeh, Eunjin Oh, Frank Staals, David P. Woodruff

TL;DR

Range Counting Oracles for Geometric Problems studies sublinear estimators for Earth Mover Distance (EMD) and Euclidean MST under a range-counting oracle on point sets in $[Δ]^d$. It develops dimension-aware sublinear algorithms for EMD, including a long/short edge decomposition in 1D and a quadtree-based extension to higher dimensions, achieving a relative error $O(\log Δ)$ and additive error $O(nΔ/s^{1+1/d})$ with sublinear query budgets; and an MST estimator achieving a $(1+ε)$-approximation with $O(\sqrt{n})$ queries. The work also introduces a uniform sampling primitive for non-empty grid cells and reduces MST estimation to a component-counting problem via a quadtree-based spanner and a well-separated pair decomposition (WSPD). It provides tight lower bounds, showing the proposed tradeoffs are essentially optimal up to polylog factors and dimension. These results enable efficient sublinear estimation of geometric costs in large-scale data systems where only range counts are accessible.

Abstract

In this paper, we study estimators for geometric optimization problems in the sublinear geometric model. In this model, we have oracle access to a point set with size $n$ in a discrete space $[Δ]^d$, where queries can be made to an oracle that responds to orthogonal range counting requests. The query complexity of an optimization problem is measured by the number of oracle queries required to compute an estimator for the problem. We investigate two problems in this framework, the Euclidean Minimum Spanning Tree (MST) and Earth Mover Distance (EMD). For EMD, we show the existence of an estimator that approximates the cost of EMD with $O(\log Δ)$-relative error and $O(\frac{nΔ}{s^{1+1/d}})$-additive error using $O(s\polylog Δ)$ range counting queries for any parameter $s$ with $1\leq s \leq n$. Moreover, we prove that this bound is tight. For MST, we demonstrate that the weight of MST can be estimated within a factor of $(1 \pm \eps)$ using $\tilde{O}(\sqrt{n})$ range counting queries.

Range Counting Oracles for Geometric Problems

TL;DR

Range Counting Oracles for Geometric Problems studies sublinear estimators for Earth Mover Distance (EMD) and Euclidean MST under a range-counting oracle on point sets in . It develops dimension-aware sublinear algorithms for EMD, including a long/short edge decomposition in 1D and a quadtree-based extension to higher dimensions, achieving a relative error and additive error with sublinear query budgets; and an MST estimator achieving a -approximation with queries. The work also introduces a uniform sampling primitive for non-empty grid cells and reduces MST estimation to a component-counting problem via a quadtree-based spanner and a well-separated pair decomposition (WSPD). It provides tight lower bounds, showing the proposed tradeoffs are essentially optimal up to polylog factors and dimension. These results enable efficient sublinear estimation of geometric costs in large-scale data systems where only range counts are accessible.

Abstract

In this paper, we study estimators for geometric optimization problems in the sublinear geometric model. In this model, we have oracle access to a point set with size in a discrete space , where queries can be made to an oracle that responds to orthogonal range counting requests. The query complexity of an optimization problem is measured by the number of oracle queries required to compute an estimator for the problem. We investigate two problems in this framework, the Euclidean Minimum Spanning Tree (MST) and Earth Mover Distance (EMD). For EMD, we show the existence of an estimator that approximates the cost of EMD with -relative error and -additive error using range counting queries for any parameter with . Moreover, we prove that this bound is tight. For MST, we demonstrate that the weight of MST can be estimated within a factor of using range counting queries.

Paper Structure

This paper contains 25 sections, 35 theorems, 4 equations, 4 figures, 1 table, 2 algorithms.

Key Result

Lemma 1

We can select a point of $P$ uniformly at random using $O(\log\Delta)$ range counting queries.

Figures (4)

  • Figure 1: (a) All segments use the near gadget. The cost of this instance is $n$. (b) The gray segment uses the far gadget. The cost of this instance is $\Theta(n\Delta/s^2)$.
  • Figure 2: We partition $[\Delta]^2$ into $16s$ squares. In each square, we place either the far gadget or the near gadget. The cost of the far gadget is at least $n\Delta/s^2$, and the cost of the near gadget is $\Theta(n/s)$.
  • Figure 3: Illustration for the uniform instance and a non-uniform instance. The gray segment is the witness of the non-uniform instance.
  • Figure 4: The domain $[\Delta]$ is partitioned into $16n^{1/3}$ cells. Each cell contains the strip gadget or the uniform gadget. The strip gadget has cost $\Theta(n^{5/6})$ while the uniform gadget has cost $\Theta(n^{7/6})$.

Theorems & Definitions (35)

  • Lemma 1: Telescoping Sampling DBLP:conf/approx/Monemizadeh23
  • Theorem 2
  • Corollary 3
  • Lemma 4
  • Lemma 5
  • Lemma 6
  • Theorem 7
  • Corollary 8
  • Lemma 9: DBLP:conf/stoc/Charikar02DBLP:conf/stoc/Indyk04
  • Lemma 10
  • ...and 25 more