Table of Contents
Fetching ...

Are We Still Missing an Item?

Roey Magen

TL;DR

The paper studies the Missing Item Finding (MIF) problem in streaming settings, examining both static long streams and adversarially robust variants. It shows polylogarithmic-space solutions even when the stream length $ ext{ℓ}$ is very close to the universe size $n$, and tight linear-in-$k$ bounds for the long regime where $ ext{ℓ}=n+k$. It establishes lower bounds for adversarially robust models, including a pseudo-deterministic zero-error bound of $\tilde{\Omega}(\text{ℓ}/\log^2 n)$ and a random-start bound of $\tilde{\Omega}(\sqrt{\text{ℓ}}/\log n)$, while demonstrating a random-on-the-fly model that achieves $O(\log n)$ space for small $ ext{ℓ}$ (specifically $\text{ℓ}=o(\sqrt{n})$). By linking to communication complexity (notably disjointness) and related problems, the work clarifies the separation between static and adversarial streaming models and informs the design of space-efficient algorithms for robust streaming tasks.

Abstract

The missing item problem, as introduced by Stoeckl in his work at SODA 23, focuses on continually identifying a missing element $e$ in a stream of elements ${e_1, ..., e_{\ell}}$ from the set $\{1,2,...,n\}$, such that $e \neq e_i$ for any $i \in \{1,...,\ell\}$. Stoeckl's investigation primarily delves into scenarios with $\ell<n$, providing bounds for the (i) deterministic case, (ii) the static case -- where the algorithm might be randomized but the stream is fixed in advanced and (iii) the adversarially robust case -- where the algorithm is randomized and each stream element can be chosen depending on earlier algorithm outputs. Building upon this foundation, our paper addresses previously unexplored aspects of the missing item problem. In the first segment, we examine the static setting with a long stream, where the length of the steam $\ell$ is close to or even exceeds the size of the universe $n$. We present an algorithm demonstrating that even when $\ell$ is very close to $n$ (say $\ell=n-1$), polylog($n$) bits of memory suffice to identify the missing item. When the stream's length $\ell$ exceeds the size of the universe $n$ i.e. $\ell = n +k$, we show a tight bound of roughly $Θ(k)$. The second segment focuses on the adversarially robust setting. We show a lower bound for a pseudo-deterministic error-zero (where the algorithm reports its errors) algorithm of approximating $Ω(\ell)$, up to polylog factors. Based on Stoeckl's work and the previous result, we establish a tight bound for a random-start (only use randomness at initialization) error-zero streaming algorithm of roughly $Θ(\sqrt{\ell})$.

Are We Still Missing an Item?

TL;DR

The paper studies the Missing Item Finding (MIF) problem in streaming settings, examining both static long streams and adversarially robust variants. It shows polylogarithmic-space solutions even when the stream length is very close to the universe size , and tight linear-in- bounds for the long regime where . It establishes lower bounds for adversarially robust models, including a pseudo-deterministic zero-error bound of and a random-start bound of , while demonstrating a random-on-the-fly model that achieves space for small (specifically ). By linking to communication complexity (notably disjointness) and related problems, the work clarifies the separation between static and adversarial streaming models and informs the design of space-efficient algorithms for robust streaming tasks.

Abstract

The missing item problem, as introduced by Stoeckl in his work at SODA 23, focuses on continually identifying a missing element in a stream of elements from the set , such that for any . Stoeckl's investigation primarily delves into scenarios with , providing bounds for the (i) deterministic case, (ii) the static case -- where the algorithm might be randomized but the stream is fixed in advanced and (iii) the adversarially robust case -- where the algorithm is randomized and each stream element can be chosen depending on earlier algorithm outputs. Building upon this foundation, our paper addresses previously unexplored aspects of the missing item problem. In the first segment, we examine the static setting with a long stream, where the length of the steam is close to or even exceeds the size of the universe . We present an algorithm demonstrating that even when is very close to (say ), polylog() bits of memory suffice to identify the missing item. When the stream's length exceeds the size of the universe i.e. , we show a tight bound of roughly . The second segment focuses on the adversarially robust setting. We show a lower bound for a pseudo-deterministic error-zero (where the algorithm reports its errors) algorithm of approximating , up to polylog factors. Based on Stoeckl's work and the previous result, we establish a tight bound for a random-start (only use randomness at initialization) error-zero streaming algorithm of roughly .
Paper Structure (9 sections, 9 theorems, 19 equations, 1 table, 1 algorithm)

This paper contains 9 sections, 9 theorems, 19 equations, 1 table, 1 algorithm.

Key Result

Theorem 1

Any randomized communication protocol that computes disjointness function with error $1/2-\epsilon$ must have communication $\Omega(\epsilon n)$.

Theorems & Definitions (21)

  • Definition 1: Pseudo-deterministic Streaming Algorithm
  • Definition 2: Zero Error Algorithm
  • Definition 3: Random Start Algorithm
  • Definition 4: Streaming Algorithm with randomness on the fly
  • Theorem 1: Theorem 6.19 in rao2020communication
  • Lemma 1: Lemma 4.1, from chakrabarti2021adversarially
  • Theorem 2: Thm. 9 from jayaram2021perfect
  • Theorem 3
  • proof
  • Remark 1
  • ...and 11 more