Table of Contents
Fetching ...

A New Information Complexity Measure for Multi-pass Streaming with Applications

Mark Braverman, Sumegha Garg, Qian Li, Shuo Wang, David P. Woodruff, Jiapeng Zhang

TL;DR

This work introduces a multi-pass information complexity framework (MIC) to analyze streaming problems under multiple passes. It establishes a general upper bound MIC$(\mathsf M,\mu)\le 2k s n$ and develops MIC$_{cond}$ as a refined tool, enabling tight lower bounds for key problems such as the coin and needle problems across any fixed number of passes. The coin problem is shown to require $\Omega\left(\frac{\log n}{k}\right)$ bits of memory for $k$ passes, while the needle problem yields a $\Omega\left(\frac{1}{p^2}\right)$ bound (and matching upper bounds in several regimes), resolving open questions and improving prior results. The authors also provide algorithmic upper bounds for the needle problem (e.g., $\mathsf M_1$ and $\mathsf M_2$) and show broad streaming applications, including multi-pass lower bounds for $\ell_p$-norm estimation, point queries, heavy hitters, and compressed sensing. Altogether, the paper contributes a versatile toolkit for multi-pass lower bounds and demonstrates a unified approach to several foundational data-stream problems with stochastic inputs, offering new insights into the inherent memory costs of multi-pass streaming under randomness.

Abstract

We introduce a new notion of information complexity for multi-pass streaming problems and use it to resolve several important questions in data streams. In the coin problem, one sees a stream of $n$ i.i.d. uniform bits and one would like to compute the majority with constant advantage. We show that any constant pass algorithm must use $Ω(\log n)$ bits of memory, significantly extending an earlier $Ω(\log n)$ bit lower bound for single-pass algorithms of Braverman-Garg-Woodruff (FOCS, 2020). This also gives the first $Ω(\log n)$ bit lower bound for the problem of approximating a counter up to a constant factor in worst-case turnstile streams for more than one pass. In the needle problem, one either sees a stream of $n$ i.i.d. uniform samples from a domain $[t]$, or there is a randomly chosen needle $α\in[t]$ for which each item independently is chosen to equal $α$ with probability $p$, and is otherwise uniformly random in $[t]$. The problem of distinguishing these two cases is central to understanding the space complexity of the frequency moment estimation problem in random order streams. We show tight multi-pass space bounds for this problem for every $p < 1/\sqrt{n \log^3 n}$, resolving an open question of Lovett and Zhang (FOCS, 2023); even for $1$-pass our bounds are new. To show optimality, we improve both lower and upper bounds from existing results. Our information complexity framework significantly extends the toolkit for proving multi-pass streaming lower bounds, and we give a wide number of additional streaming applications of our lower bound techniques, including multi-pass lower bounds for $\ell_p$-norm estimation, $\ell_p$-point query and heavy hitters, and compressed sensing problems.

A New Information Complexity Measure for Multi-pass Streaming with Applications

TL;DR

This work introduces a multi-pass information complexity framework (MIC) to analyze streaming problems under multiple passes. It establishes a general upper bound MIC and develops MIC as a refined tool, enabling tight lower bounds for key problems such as the coin and needle problems across any fixed number of passes. The coin problem is shown to require bits of memory for passes, while the needle problem yields a bound (and matching upper bounds in several regimes), resolving open questions and improving prior results. The authors also provide algorithmic upper bounds for the needle problem (e.g., and ) and show broad streaming applications, including multi-pass lower bounds for -norm estimation, point queries, heavy hitters, and compressed sensing. Altogether, the paper contributes a versatile toolkit for multi-pass lower bounds and demonstrates a unified approach to several foundational data-stream problems with stochastic inputs, offering new insights into the inherent memory costs of multi-pass streaming under randomness.

Abstract

We introduce a new notion of information complexity for multi-pass streaming problems and use it to resolve several important questions in data streams. In the coin problem, one sees a stream of i.i.d. uniform bits and one would like to compute the majority with constant advantage. We show that any constant pass algorithm must use bits of memory, significantly extending an earlier bit lower bound for single-pass algorithms of Braverman-Garg-Woodruff (FOCS, 2020). This also gives the first bit lower bound for the problem of approximating a counter up to a constant factor in worst-case turnstile streams for more than one pass. In the needle problem, one either sees a stream of i.i.d. uniform samples from a domain , or there is a randomly chosen needle for which each item independently is chosen to equal with probability , and is otherwise uniformly random in . The problem of distinguishing these two cases is central to understanding the space complexity of the frequency moment estimation problem in random order streams. We show tight multi-pass space bounds for this problem for every , resolving an open question of Lovett and Zhang (FOCS, 2023); even for -pass our bounds are new. To show optimality, we improve both lower and upper bounds from existing results. Our information complexity framework significantly extends the toolkit for proving multi-pass streaming lower bounds, and we give a wide number of additional streaming applications of our lower bound techniques, including multi-pass lower bounds for -norm estimation, -point query and heavy hitters, and compressed sensing problems.
Paper Structure (56 sections, 49 theorems, 218 equations, 5 algorithms)

This paper contains 56 sections, 49 theorems, 218 equations, 5 algorithms.

Key Result

Lemma 1.0

Assuming that $(X_1,X_2,\cdots,X_n)$ are drawn from a product distribution $\mu$. Then, for any $k$-pass streaming algorithm $\mathsf{M}$ with memory size $s$ running on input stream $X_1,\cdots,X_n$, it holds that:

Theorems & Definitions (98)

  • Lemma 1.0
  • Theorem 1.1
  • Theorem 1.2
  • Theorem 1.3
  • Theorem 1.4: Multi-Pass Multi-$\ell_p$-Estimation
  • Theorem 1.5: Multi-Pass Point Query and Heavy Hitters
  • Theorem 1.6
  • Theorem 1.7: lovett2023streaming
  • Theorem 1.8: Multi-Pass Lower Bound for the Needle Problem
  • Theorem 1.9: Improved Upper Bound for the Needle Problem
  • ...and 88 more