Pseudorandom Hashing for Space-bounded Computation with Applications in Streaming
Praneeth Kacham, Rasmus Pagh, Mikkel Thorup, David P. Woodruff
TL;DR
This work advances space-efficient streaming by introducing HashPRG, a symmetric, large-alphabet pseudorandom generator that offers a tunable seed-versus-update-time trade-off for derandomizing streaming algorithms. HashPRG enables near-optimal derandomizations for key primitives, including $F_p$ moment estimation (both $p>2$ and $0<p<2$), CountSketch with tight error guarantees, and Private CountSketch, while preserving space up to polylog factors and achieving fast per-update times in the Word RAM model. The authors develop a refined independence framework and leverage symmetry to reduce derandomization overhead, delivering practical, space-efficient, low-latency streaming algorithms. A core consequence is tighter upper and matching lower bounds for estimating $ orm{x}_{ inf}$ in turnstile streams, along with a general-purpose derandomization toolkit (HashPRG) that parallels Nisan’s generator but with improved performance characteristics. Overall, HashPRG enables robust, provably efficient derandomizations across a broad spectrum of streaming tasks, with direct implications for efficient data analysis in turnstile streams.
Abstract
We revisit Nisan's classical pseudorandom generator (PRG) for space-bounded computation (STOC 1990) and its applications in streaming algorithms. We describe a new generator, HashPRG, that can be thought of as a symmetric version of Nisan's generator over larger alphabets. Our generator allows a trade-off between seed length and the time needed to compute a given block of the generator's output. HashPRG can be used to obtain derandomizations with much better update time and \emph{without sacrificing space} for a large number of data stream algorithms, such as $F_p$ estimation in the parameter regimes $p > 2$ and $0 < p < 2$ and CountSketch with tight estimation guarantees as analyzed by Minton and Price (SODA 2014) which assumed access to a random oracle. We also show a recent analysis of Private CountSketch can be derandomized using our techniques. For a $d$-dimensional vector $x$ being updated in a turnstile stream, we show that $\|x\|_{\infty}$ can be estimated up to an additive error of $\varepsilon\|x\|_{2}$ using $O(\varepsilon^{-2}\log(1/\varepsilon)\log d)$ bits of space. Additionally, the update time of this algorithm is $O(\log 1/\varepsilon)$ in the Word RAM model. We show that the space complexity of this algorithm is optimal up to constant factors. However, for vectors $x$ with $\|x\|_{\infty} = Θ(\|x\|_{2})$, we show that the lower bound can be broken by giving an algorithm that uses $O(\varepsilon^{-2}\log d)$ bits of space which approximates $\|x\|_{\infty}$ up to an additive error of $\varepsilon\|x\|_{2}$. We use our aforementioned derandomization of the CountSketch data structure to obtain this algorithm, and using the time-space trade off of HashPRG, we show that the update time of this algorithm is also $O(\log 1/\varepsilon)$ in the Word RAM model.
