Table of Contents
Fetching ...

Testing Identity of Distributions under Kolmogorov Distance in Polylogarithmic Space

Christian Janos Lebeda, Jakub Tětek

Abstract

Suppose we have a sample from a distribution $D$ and we want to test whether $D = D^*$ for a fixed distribution $D^*$. Specifically, we want to reject with constant probability, if the distance of $D$ from $D^*$ is $\geq \varepsilon$ in a given metric. In the case of continuous distributions, this has been studied thoroughly in the statistics literature. Namely, for the well-studied Kolmogorov metric a test is known that uses the optimal $O(1/\varepsilon^2)$ samples. However, this test naively uses also space $O(1/\varepsilon^2)$, and previous work improved this to $O(1/\varepsilon)$. In this paper, we show that much less space suffices -- we give an algorithm that uses space $O(\log^4 \varepsilon^{-1})$ in the streaming setting while also using an asymptotically optimal number of samples. This is in contrast with the standard total variation distance on discrete distributions for which such space reduction is known to be impossible. Finally, we state 9 related open problems that we hope will spark interest in this and related problems.

Testing Identity of Distributions under Kolmogorov Distance in Polylogarithmic Space

Abstract

Suppose we have a sample from a distribution and we want to test whether for a fixed distribution . Specifically, we want to reject with constant probability, if the distance of from is in a given metric. In the case of continuous distributions, this has been studied thoroughly in the statistics literature. Namely, for the well-studied Kolmogorov metric a test is known that uses the optimal samples. However, this test naively uses also space , and previous work improved this to . In this paper, we show that much less space suffices -- we give an algorithm that uses space in the streaming setting while also using an asymptotically optimal number of samples. This is in contrast with the standard total variation distance on discrete distributions for which such space reduction is known to be impossible. Finally, we state 9 related open problems that we hope will spark interest in this and related problems.

Paper Structure

This paper contains 25 sections, 4 theorems, 4 equations.

Key Result

Lemma 3.1

Let us have two continuous distributions $D,D^*$ over a totally ordered set $U$. Suppose that $d_K(D,D^*) \geq \varepsilon$. Then there exists $j \in [\lceil\lg 1/\varepsilon\rceil +2]$ and $i \in [2^j]$ such that $|D^*(B_{i,j}) - D(B_{i,j})| \geq 2 \Delta = \max\left(\frac{\varepsilon}{j^{2}},\fra

Theorems & Definitions (4)

  • Lemma 3.1
  • Lemma 3.2
  • Lemma 3.3
  • Theorem 3.1