A New Information Complexity Measure for Multi-pass Streaming with Applications
Mark Braverman, Sumegha Garg, Qian Li, Shuo Wang, David P. Woodruff, Jiapeng Zhang
TL;DR
This work introduces a multi-pass information complexity framework (MIC) to analyze streaming problems under multiple passes. It establishes a general upper bound MIC$(\mathsf M,\mu)\le 2k s n$ and develops MIC$_{cond}$ as a refined tool, enabling tight lower bounds for key problems such as the coin and needle problems across any fixed number of passes. The coin problem is shown to require $\Omega\left(\frac{\log n}{k}\right)$ bits of memory for $k$ passes, while the needle problem yields a $\Omega\left(\frac{1}{p^2}\right)$ bound (and matching upper bounds in several regimes), resolving open questions and improving prior results. The authors also provide algorithmic upper bounds for the needle problem (e.g., $\mathsf M_1$ and $\mathsf M_2$) and show broad streaming applications, including multi-pass lower bounds for $\ell_p$-norm estimation, point queries, heavy hitters, and compressed sensing. Altogether, the paper contributes a versatile toolkit for multi-pass lower bounds and demonstrates a unified approach to several foundational data-stream problems with stochastic inputs, offering new insights into the inherent memory costs of multi-pass streaming under randomness.
Abstract
We introduce a new notion of information complexity for multi-pass streaming problems and use it to resolve several important questions in data streams. In the coin problem, one sees a stream of $n$ i.i.d. uniform bits and one would like to compute the majority with constant advantage. We show that any constant pass algorithm must use $Ω(\log n)$ bits of memory, significantly extending an earlier $Ω(\log n)$ bit lower bound for single-pass algorithms of Braverman-Garg-Woodruff (FOCS, 2020). This also gives the first $Ω(\log n)$ bit lower bound for the problem of approximating a counter up to a constant factor in worst-case turnstile streams for more than one pass. In the needle problem, one either sees a stream of $n$ i.i.d. uniform samples from a domain $[t]$, or there is a randomly chosen needle $α\in[t]$ for which each item independently is chosen to equal $α$ with probability $p$, and is otherwise uniformly random in $[t]$. The problem of distinguishing these two cases is central to understanding the space complexity of the frequency moment estimation problem in random order streams. We show tight multi-pass space bounds for this problem for every $p < 1/\sqrt{n \log^3 n}$, resolving an open question of Lovett and Zhang (FOCS, 2023); even for $1$-pass our bounds are new. To show optimality, we improve both lower and upper bounds from existing results. Our information complexity framework significantly extends the toolkit for proving multi-pass streaming lower bounds, and we give a wide number of additional streaming applications of our lower bound techniques, including multi-pass lower bounds for $\ell_p$-norm estimation, $\ell_p$-point query and heavy hitters, and compressed sensing problems.
