Support Testing in the Huge Object Model

Tomer Adar; Eldar Fischer; Amit Levi

Support Testing in the Huge Object Model

Tomer Adar, Eldar Fischer, Amit Levi

TL;DR

This work investigates the problem of testing whether a distribution is supported on $m$ elements in the Huge Object model and proves lower and upper bounds for both adaptive and non-adaptive algorithms in the one-sided and two-sided error regime.

Abstract

The Huge Object model is a distribution testing model in which we are given access to independent samples from an unknown distribution over the set of strings $\{0,1\}^n$, but are only allowed to query a few bits from the samples. We investigate the problem of testing whether a distribution is supported on $m$ elements in this model. It turns out that the behavior of this property is surprisingly intricate, especially when also considering the question of adaptivity. We prove lower and upper bounds for both adaptive and non-adaptive algorithms in the one-sided and two-sided error regime. Our bounds are tight when $m$ is fixed to a constant (and the distance parameter $\varepsilon$ is the only variable). For the general case, our bounds are at most $O(\log m)$ apart. In particular, our results show a surprising $O(\log \varepsilon^{-1})$ gap between the number of queries required for non-adaptive testing as compared to adaptive testing. For one sided error testing, we also show that a $O(\log m)$ gap between the number of samples and the number of queries is necessary. Our results utilize a wide variety of combinatorial and probabilistic methods.

Support Testing in the Huge Object Model

TL;DR

This work investigates the problem of testing whether a distribution is supported on

elements in the Huge Object model and proves lower and upper bounds for both adaptive and non-adaptive algorithms in the one-sided and two-sided error regime.

Abstract

The Huge Object model is a distribution testing model in which we are given access to independent samples from an unknown distribution over the set of strings

, but are only allowed to query a few bits from the samples. We investigate the problem of testing whether a distribution is supported on

elements in this model. It turns out that the behavior of this property is surprisingly intricate, especially when also considering the question of adaptivity. We prove lower and upper bounds for both adaptive and non-adaptive algorithms in the one-sided and two-sided error regime. Our bounds are tight when

is fixed to a constant (and the distance parameter

is the only variable). For the general case, our bounds are at most

apart. In particular, our results show a surprising

gap between the number of queries required for non-adaptive testing as compared to adaptive testing. For one sided error testing, we also show that a

gap between the number of samples and the number of queries is necessary. Our results utilize a wide variety of combinatorial and probabilistic methods.

Paper Structure (41 sections, 40 theorems, 41 equations, 6 algorithms)

This paper contains 41 sections, 40 theorems, 41 equations, 6 algorithms.

Introduction
Definition of the model
Summary of our results
Table of results
Adaptive vs. non-adaptive two-sided asymptotic gap
One-sided bounds and a gap from the standard model
A new algorithmic paradigm
A hybrid probabilistic-extremal analysis
A new use for an old combinatorial result
Open problems
One-sided non-adaptive bounds
Non-trivial two-sided bounds
One-sided adaptive bounds
The tradeoffs between sample and query complexity
Preliminaries
...and 26 more sections

Key Result

Lemma 2.6

Consider a black box subroutine $\mathcal{A}$ with fail stability (Definition def:fail-stability) and diminishing returns (Definition def:diminishing-returns) with respect to a common set $G$ of outcomes indicating success. For an algorithm that repeatedly executes $\mathcal{A}$, we define the follo Considering the parameters $p > 0$ (threshold), $q > 0$ (confidence), and $k \ge 1$ (goal), there e

Theorems & Definitions (106)

Definition 1.1: String distance
Definition 1.2: Transfer distribution
Definition 1.3: Variation distance
Definition 1.4: Earth mover's distance
Definition 1.5: A property
Definition 1.6: Distance of a distribution from a property
Definition 1.7: $\varepsilon$-test
Definition 1.8: one-sided and two-sided $\varepsilon$-test
Definition 2.1: Matrix representation of input access
Definition 2.2: Adaptive algorithm
...and 96 more

Support Testing in the Huge Object Model

TL;DR

Abstract

Support Testing in the Huge Object Model

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (106)