Table of Contents
Fetching ...

Finding missing items requires strong forms of randomness

Amit Chakrabarti, Manuel Stoeckl

TL;DR

The paper reveals fundamental separations in space complexity for adversarially robust streaming on the MissingItemFinding problem, depending on how randomness is accessed. It delivers an upper bound for random-tape algorithms via a multi-level randomized tree and strong lower bounds that separate random-seed, random-tape, and random-oracle models, including a PD-based RS lower bound and a PD-to-RS transfer. The results show exponential gaps between random-seed and random-tape models and demonstrate that sampling-based approaches can outperform sketching in adversarial settings. The work advances understanding of how randomness usage constraints affect sublinear-space streaming algorithms under adaptive adversaries, with implications for the design of space-efficient robust architectures. The findings underscore that oracle-like randomness can dramatically reduce space in some models, but not in others, highlighting nuanced trade-offs for real-world streaming systems.

Abstract

Adversarially robust streaming algorithms are required to process a stream of elements and produce correct outputs, even when each stream element can be chosen as a function of earlier algorithm outputs. As with classic streaming algorithms, which must only be correct for the worst-case fixed stream, adversarially robust algorithms with access to randomness can use significantly less space than deterministic algorithms. We prove that for the Missing Item Finding problem in streaming, the space complexity also significantly depends on how adversarially robust algorithms are permitted to use randomness. (In contrast, the space complexity of classic streaming algorithms does not depend as strongly on the way randomness is used.) For Missing Item Finding on streams of length $\ell$ with elements in $\{1,\ldots,n\}$, and $\le 1/\text{poly}(\ell)$ error, we show that when $\ell = O(2^{\sqrt{\log n}})$, "random seed" adversarially robust algorithms, which only use randomness at initialization, require $\ell^{Ω(1)}$ bits of space, while "random tape" adversarially robust algorithms, which may make random decisions at any time, may use $O(\text{polylog}(\ell))$ space. When $\ell$ is between $n^{Ω(1)}$ and $O(\sqrt{n})$, "random tape" adversarially robust algorithms need $\ell^{Ω(1)}$ space, while "random oracle" adversarially robust algorithms, which can read from a long random string for free, may use $O(\text{polylog}(\ell))$ space. The space lower bound for the "random seed" case follows, by a reduction given in prior work, from a lower bound for pseudo-deterministic streaming algorithms given in this paper.

Finding missing items requires strong forms of randomness

TL;DR

The paper reveals fundamental separations in space complexity for adversarially robust streaming on the MissingItemFinding problem, depending on how randomness is accessed. It delivers an upper bound for random-tape algorithms via a multi-level randomized tree and strong lower bounds that separate random-seed, random-tape, and random-oracle models, including a PD-based RS lower bound and a PD-to-RS transfer. The results show exponential gaps between random-seed and random-tape models and demonstrate that sampling-based approaches can outperform sketching in adversarial settings. The work advances understanding of how randomness usage constraints affect sublinear-space streaming algorithms under adaptive adversaries, with implications for the design of space-efficient robust architectures. The findings underscore that oracle-like randomness can dramatically reduce space in some models, but not in others, highlighting nuanced trade-offs for real-world streaming systems.

Abstract

Adversarially robust streaming algorithms are required to process a stream of elements and produce correct outputs, even when each stream element can be chosen as a function of earlier algorithm outputs. As with classic streaming algorithms, which must only be correct for the worst-case fixed stream, adversarially robust algorithms with access to randomness can use significantly less space than deterministic algorithms. We prove that for the Missing Item Finding problem in streaming, the space complexity also significantly depends on how adversarially robust algorithms are permitted to use randomness. (In contrast, the space complexity of classic streaming algorithms does not depend as strongly on the way randomness is used.) For Missing Item Finding on streams of length with elements in , and error, we show that when , "random seed" adversarially robust algorithms, which only use randomness at initialization, require bits of space, while "random tape" adversarially robust algorithms, which may make random decisions at any time, may use space. When is between and , "random tape" adversarially robust algorithms need space, while "random oracle" adversarially robust algorithms, which can read from a long random string for free, may use space. The space lower bound for the "random seed" case follows, by a reduction given in prior work, from a lower bound for pseudo-deterministic streaming algorithms given in this paper.
Paper Structure (26 sections, 29 theorems, 126 equations, 3 figures, 1 table, 4 algorithms)

This paper contains 26 sections, 29 theorems, 126 equations, 3 figures, 1 table, 4 algorithms.

Key Result

Lemma 3.1

Let $X_1,\ldots,X_t$ be $[0,1]$ random variables, and $\alpha \ge 0$. If, for all $i \in [t]$, $\mathbb{E}[X_i \mid X_1,\ldots,X_{i-1}] \le p_i$, then On the other hand, if for all $i$, $\mathbb{E}[X_i \mid X_1,\ldots,X_{i-1}] \ge p_i$, then

Figures (3)

  • Figure 1: Known bounds for the space complexity of $\textsc{mif}\xspace(n,\mathbf{\ell})$ in different streaming models, at error level $\delta=1/n^2$. This is a log-log plot. Upper and lower bounds are drawn using lines of the same color; the region between them is shaded. The upper and lower bounds shown all match (up to $\mathrm{polylog}\xspace(n)$ factors) except for the case of adversarially robust, random tape algorithms. Pseudo-deterministic and deterministic complexities match within $\mathrm{polylog}\xspace(n)$ factors.
  • Figure 2: A diagram illustrating the state of an instance of \ref{['alg:rt-example']} on an example input. Positions on the horizontal axis correspond to integers in $[n]$; the set of values in the input stream ($\{1,2,4,9,12,13,\ldots\}$) is marked with black squares; the current output value ($15$) with a circle. Outside this example, $L$ need not be contiguous or in sorted order.
  • Figure 3: Diagram showing the state of the algorithm in \ref{['alg:rt']} and how it relates to the parts of the implicit random tree that the algorithm traverses. Positions on the horizontal axis correspond to different integers in $[n]$. To keep the example legible, we set parameters $d = 3$, $w_1 = 7, w_2 = 4, w_3 = 3$, and $b_1 = 4, b_2 = 3, b_3 = 3$.

Theorems & Definitions (59)

  • Lemma 3.1: Multiplicative Azuma's inequality
  • Lemma 3.2: Chernoff bound with negative association, from JoagDevP83
  • Lemma 3.3: Error amplification by majority vote
  • Theorem 3.4: avoid communication lower bound, from ChakrabartiGS22
  • Theorem 3.5: Adversarially robust random oracle lower bound, from Stoeckl23
  • proof
  • Lemma 3.6
  • proof
  • Definition 4.1
  • Lemma 4.2: Base case
  • ...and 49 more