Finding missing items requires strong forms of randomness

Amit Chakrabarti; Manuel Stoeckl

Finding missing items requires strong forms of randomness

Amit Chakrabarti, Manuel Stoeckl

TL;DR

The paper reveals fundamental separations in space complexity for adversarially robust streaming on the MissingItemFinding problem, depending on how randomness is accessed. It delivers an upper bound for random-tape algorithms via a multi-level randomized tree and strong lower bounds that separate random-seed, random-tape, and random-oracle models, including a PD-based RS lower bound and a PD-to-RS transfer. The results show exponential gaps between random-seed and random-tape models and demonstrate that sampling-based approaches can outperform sketching in adversarial settings. The work advances understanding of how randomness usage constraints affect sublinear-space streaming algorithms under adaptive adversaries, with implications for the design of space-efficient robust architectures. The findings underscore that oracle-like randomness can dramatically reduce space in some models, but not in others, highlighting nuanced trade-offs for real-world streaming systems.

Abstract

Adversarially robust streaming algorithms are required to process a stream of elements and produce correct outputs, even when each stream element can be chosen as a function of earlier algorithm outputs. As with classic streaming algorithms, which must only be correct for the worst-case fixed stream, adversarially robust algorithms with access to randomness can use significantly less space than deterministic algorithms. We prove that for the Missing Item Finding problem in streaming, the space complexity also significantly depends on how adversarially robust algorithms are permitted to use randomness. (In contrast, the space complexity of classic streaming algorithms does not depend as strongly on the way randomness is used.) For Missing Item Finding on streams of length $\ell$ with elements in $\{1,\ldots,n\}$, and $\le 1/\text{poly}(\ell)$ error, we show that when $\ell = O(2^{\sqrt{\log n}})$, "random seed" adversarially robust algorithms, which only use randomness at initialization, require $\ell^{Ω(1)}$ bits of space, while "random tape" adversarially robust algorithms, which may make random decisions at any time, may use $O(\text{polylog}(\ell))$ space. When $\ell$ is between $n^{Ω(1)}$ and $O(\sqrt{n})$, "random tape" adversarially robust algorithms need $\ell^{Ω(1)}$ space, while "random oracle" adversarially robust algorithms, which can read from a long random string for free, may use $O(\text{polylog}(\ell))$ space. The space lower bound for the "random seed" case follows, by a reduction given in prior work, from a lower bound for pseudo-deterministic streaming algorithms given in this paper.

Finding missing items requires strong forms of randomness

TL;DR

Abstract

with elements in

, and

error, we show that when

, "random seed" adversarially robust algorithms, which only use randomness at initialization, require

bits of space, while "random tape" adversarially robust algorithms, which may make random decisions at any time, may use

space. When

is between

and

, "random tape" adversarially robust algorithms need

space, while "random oracle" adversarially robust algorithms, which can read from a long random string for free, may use

space. The space lower bound for the "random seed" case follows, by a reduction given in prior work, from a lower bound for pseudo-deterministic streaming algorithms given in this paper.

Paper Structure (26 sections, 29 theorems, 126 equations, 3 figures, 1 table, 4 algorithms)

This paper contains 26 sections, 29 theorems, 126 equations, 3 figures, 1 table, 4 algorithms.

Introduction
Groundwork for Our Results
Our Results
Related Work
Technical Overview
Random Tape Upper Bound
Random Tape Lower Bound
Random Seed Lower Bound via Pseudo-Determinism
Pseudo-Deterministic Lower Bound
Preliminaries
Useful Lemmas
The Random Tape Lower Bound
Setup and Base Case
The Induction Step
Calculating the Lower Bound
...and 11 more sections

Key Result

Lemma 3.1

Let $X_1,\ldots,X_t$ be $[0,1]$ random variables, and $\alpha \ge 0$. If, for all $i \in [t]$, $\mathbb{E}[X_i \mid X_1,\ldots,X_{i-1}] \le p_i$, then On the other hand, if for all $i$, $\mathbb{E}[X_i \mid X_1,\ldots,X_{i-1}] \ge p_i$, then

Figures (3)

Figure 1: Known bounds for the space complexity of $\textsc{mif}\xspace(n,\mathbf{\ell})$ in different streaming models, at error level $\delta=1/n^2$. This is a log-log plot. Upper and lower bounds are drawn using lines of the same color; the region between them is shaded. The upper and lower bounds shown all match (up to $\mathrm{polylog}\xspace(n)$ factors) except for the case of adversarially robust, random tape algorithms. Pseudo-deterministic and deterministic complexities match within $\mathrm{polylog}\xspace(n)$ factors.
Figure 2: A diagram illustrating the state of an instance of \ref{['alg:rt-example']} on an example input. Positions on the horizontal axis correspond to integers in $[n]$; the set of values in the input stream ($\{1,2,4,9,12,13,\ldots\}$) is marked with black squares; the current output value ($15$) with a circle. Outside this example, $L$ need not be contiguous or in sorted order.
Figure 3: Diagram showing the state of the algorithm in \ref{['alg:rt']} and how it relates to the parts of the implicit random tree that the algorithm traverses. Positions on the horizontal axis correspond to different integers in $[n]$. To keep the example legible, we set parameters $d = 3$, $w_1 = 7, w_2 = 4, w_3 = 3$, and $b_1 = 4, b_2 = 3, b_3 = 3$.

Theorems & Definitions (59)

Lemma 3.1: Multiplicative Azuma's inequality
Lemma 3.2: Chernoff bound with negative association, from JoagDevP83
Lemma 3.3: Error amplification by majority vote
Theorem 3.4: avoid communication lower bound, from ChakrabartiGS22
Theorem 3.5: Adversarially robust random oracle lower bound, from Stoeckl23
proof
Lemma 3.6
proof
Definition 4.1
Lemma 4.2: Base case
...and 49 more

Finding missing items requires strong forms of randomness

TL;DR

Abstract

Finding missing items requires strong forms of randomness

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (59)