Are We Still Missing an Item?
Roey Magen
TL;DR
The paper studies the Missing Item Finding (MIF) problem in streaming settings, examining both static long streams and adversarially robust variants. It shows polylogarithmic-space solutions even when the stream length $ ext{ℓ}$ is very close to the universe size $n$, and tight linear-in-$k$ bounds for the long regime where $ ext{ℓ}=n+k$. It establishes lower bounds for adversarially robust models, including a pseudo-deterministic zero-error bound of $\tilde{\Omega}(\text{ℓ}/\log^2 n)$ and a random-start bound of $\tilde{\Omega}(\sqrt{\text{ℓ}}/\log n)$, while demonstrating a random-on-the-fly model that achieves $O(\log n)$ space for small $ ext{ℓ}$ (specifically $\text{ℓ}=o(\sqrt{n})$). By linking to communication complexity (notably disjointness) and related problems, the work clarifies the separation between static and adversarial streaming models and informs the design of space-efficient algorithms for robust streaming tasks.
Abstract
The missing item problem, as introduced by Stoeckl in his work at SODA 23, focuses on continually identifying a missing element $e$ in a stream of elements ${e_1, ..., e_{\ell}}$ from the set $\{1,2,...,n\}$, such that $e \neq e_i$ for any $i \in \{1,...,\ell\}$. Stoeckl's investigation primarily delves into scenarios with $\ell<n$, providing bounds for the (i) deterministic case, (ii) the static case -- where the algorithm might be randomized but the stream is fixed in advanced and (iii) the adversarially robust case -- where the algorithm is randomized and each stream element can be chosen depending on earlier algorithm outputs. Building upon this foundation, our paper addresses previously unexplored aspects of the missing item problem. In the first segment, we examine the static setting with a long stream, where the length of the steam $\ell$ is close to or even exceeds the size of the universe $n$. We present an algorithm demonstrating that even when $\ell$ is very close to $n$ (say $\ell=n-1$), polylog($n$) bits of memory suffice to identify the missing item. When the stream's length $\ell$ exceeds the size of the universe $n$ i.e. $\ell = n +k$, we show a tight bound of roughly $Θ(k)$. The second segment focuses on the adversarially robust setting. We show a lower bound for a pseudo-deterministic error-zero (where the algorithm reports its errors) algorithm of approximating $Ω(\ell)$, up to polylog factors. Based on Stoeckl's work and the previous result, we establish a tight bound for a random-start (only use randomness at initialization) error-zero streaming algorithm of roughly $Θ(\sqrt{\ell})$.
