Table of Contents
Fetching ...

Support Testing in the Huge Object Model

Tomer Adar, Eldar Fischer, Amit Levi

TL;DR

This work investigates the problem of testing whether a distribution is supported on $m$ elements in the Huge Object model and proves lower and upper bounds for both adaptive and non-adaptive algorithms in the one-sided and two-sided error regime.

Abstract

The Huge Object model is a distribution testing model in which we are given access to independent samples from an unknown distribution over the set of strings $\{0,1\}^n$, but are only allowed to query a few bits from the samples. We investigate the problem of testing whether a distribution is supported on $m$ elements in this model. It turns out that the behavior of this property is surprisingly intricate, especially when also considering the question of adaptivity. We prove lower and upper bounds for both adaptive and non-adaptive algorithms in the one-sided and two-sided error regime. Our bounds are tight when $m$ is fixed to a constant (and the distance parameter $\varepsilon$ is the only variable). For the general case, our bounds are at most $O(\log m)$ apart. In particular, our results show a surprising $O(\log \varepsilon^{-1})$ gap between the number of queries required for non-adaptive testing as compared to adaptive testing. For one sided error testing, we also show that a $O(\log m)$ gap between the number of samples and the number of queries is necessary. Our results utilize a wide variety of combinatorial and probabilistic methods.

Support Testing in the Huge Object Model

TL;DR

This work investigates the problem of testing whether a distribution is supported on elements in the Huge Object model and proves lower and upper bounds for both adaptive and non-adaptive algorithms in the one-sided and two-sided error regime.

Abstract

The Huge Object model is a distribution testing model in which we are given access to independent samples from an unknown distribution over the set of strings , but are only allowed to query a few bits from the samples. We investigate the problem of testing whether a distribution is supported on elements in this model. It turns out that the behavior of this property is surprisingly intricate, especially when also considering the question of adaptivity. We prove lower and upper bounds for both adaptive and non-adaptive algorithms in the one-sided and two-sided error regime. Our bounds are tight when is fixed to a constant (and the distance parameter is the only variable). For the general case, our bounds are at most apart. In particular, our results show a surprising gap between the number of queries required for non-adaptive testing as compared to adaptive testing. For one sided error testing, we also show that a gap between the number of samples and the number of queries is necessary. Our results utilize a wide variety of combinatorial and probabilistic methods.
Paper Structure (41 sections, 40 theorems, 41 equations, 6 algorithms)

This paper contains 41 sections, 40 theorems, 41 equations, 6 algorithms.

Key Result

Lemma 2.6

Consider a black box subroutine $\mathcal{A}$ with fail stability (Definition def:fail-stability) and diminishing returns (Definition def:diminishing-returns) with respect to a common set $G$ of outcomes indicating success. For an algorithm that repeatedly executes $\mathcal{A}$, we define the follo Considering the parameters $p > 0$ (threshold), $q > 0$ (confidence), and $k \ge 1$ (goal), there e

Theorems & Definitions (106)

  • Definition 1.1: String distance
  • Definition 1.2: Transfer distribution
  • Definition 1.3: Variation distance
  • Definition 1.4: Earth mover's distance
  • Definition 1.5: A property
  • Definition 1.6: Distance of a distribution from a property
  • Definition 1.7: $\varepsilon$-test
  • Definition 1.8: one-sided and two-sided $\varepsilon$-test
  • Definition 2.1: Matrix representation of input access
  • Definition 2.2: Adaptive algorithm
  • ...and 96 more