Finding missing items requires strong forms of randomness
Amit Chakrabarti, Manuel Stoeckl
TL;DR
The paper reveals fundamental separations in space complexity for adversarially robust streaming on the MissingItemFinding problem, depending on how randomness is accessed. It delivers an upper bound for random-tape algorithms via a multi-level randomized tree and strong lower bounds that separate random-seed, random-tape, and random-oracle models, including a PD-based RS lower bound and a PD-to-RS transfer. The results show exponential gaps between random-seed and random-tape models and demonstrate that sampling-based approaches can outperform sketching in adversarial settings. The work advances understanding of how randomness usage constraints affect sublinear-space streaming algorithms under adaptive adversaries, with implications for the design of space-efficient robust architectures. The findings underscore that oracle-like randomness can dramatically reduce space in some models, but not in others, highlighting nuanced trade-offs for real-world streaming systems.
Abstract
Adversarially robust streaming algorithms are required to process a stream of elements and produce correct outputs, even when each stream element can be chosen as a function of earlier algorithm outputs. As with classic streaming algorithms, which must only be correct for the worst-case fixed stream, adversarially robust algorithms with access to randomness can use significantly less space than deterministic algorithms. We prove that for the Missing Item Finding problem in streaming, the space complexity also significantly depends on how adversarially robust algorithms are permitted to use randomness. (In contrast, the space complexity of classic streaming algorithms does not depend as strongly on the way randomness is used.) For Missing Item Finding on streams of length $\ell$ with elements in $\{1,\ldots,n\}$, and $\le 1/\text{poly}(\ell)$ error, we show that when $\ell = O(2^{\sqrt{\log n}})$, "random seed" adversarially robust algorithms, which only use randomness at initialization, require $\ell^{Ω(1)}$ bits of space, while "random tape" adversarially robust algorithms, which may make random decisions at any time, may use $O(\text{polylog}(\ell))$ space. When $\ell$ is between $n^{Ω(1)}$ and $O(\sqrt{n})$, "random tape" adversarially robust algorithms need $\ell^{Ω(1)}$ space, while "random oracle" adversarially robust algorithms, which can read from a long random string for free, may use $O(\text{polylog}(\ell))$ space. The space lower bound for the "random seed" case follows, by a reduction given in prior work, from a lower bound for pseudo-deterministic streaming algorithms given in this paper.
