Table of Contents
Fetching ...

Pseudorandom Hashing for Space-bounded Computation with Applications in Streaming

Praneeth Kacham, Rasmus Pagh, Mikkel Thorup, David P. Woodruff

TL;DR

This work advances space-efficient streaming by introducing HashPRG, a symmetric, large-alphabet pseudorandom generator that offers a tunable seed-versus-update-time trade-off for derandomizing streaming algorithms. HashPRG enables near-optimal derandomizations for key primitives, including $F_p$ moment estimation (both $p>2$ and $0<p<2$), CountSketch with tight error guarantees, and Private CountSketch, while preserving space up to polylog factors and achieving fast per-update times in the Word RAM model. The authors develop a refined independence framework and leverage symmetry to reduce derandomization overhead, delivering practical, space-efficient, low-latency streaming algorithms. A core consequence is tighter upper and matching lower bounds for estimating $ orm{x}_{ inf}$ in turnstile streams, along with a general-purpose derandomization toolkit (HashPRG) that parallels Nisan’s generator but with improved performance characteristics. Overall, HashPRG enables robust, provably efficient derandomizations across a broad spectrum of streaming tasks, with direct implications for efficient data analysis in turnstile streams.

Abstract

We revisit Nisan's classical pseudorandom generator (PRG) for space-bounded computation (STOC 1990) and its applications in streaming algorithms. We describe a new generator, HashPRG, that can be thought of as a symmetric version of Nisan's generator over larger alphabets. Our generator allows a trade-off between seed length and the time needed to compute a given block of the generator's output. HashPRG can be used to obtain derandomizations with much better update time and \emph{without sacrificing space} for a large number of data stream algorithms, such as $F_p$ estimation in the parameter regimes $p > 2$ and $0 < p < 2$ and CountSketch with tight estimation guarantees as analyzed by Minton and Price (SODA 2014) which assumed access to a random oracle. We also show a recent analysis of Private CountSketch can be derandomized using our techniques. For a $d$-dimensional vector $x$ being updated in a turnstile stream, we show that $\|x\|_{\infty}$ can be estimated up to an additive error of $\varepsilon\|x\|_{2}$ using $O(\varepsilon^{-2}\log(1/\varepsilon)\log d)$ bits of space. Additionally, the update time of this algorithm is $O(\log 1/\varepsilon)$ in the Word RAM model. We show that the space complexity of this algorithm is optimal up to constant factors. However, for vectors $x$ with $\|x\|_{\infty} = Θ(\|x\|_{2})$, we show that the lower bound can be broken by giving an algorithm that uses $O(\varepsilon^{-2}\log d)$ bits of space which approximates $\|x\|_{\infty}$ up to an additive error of $\varepsilon\|x\|_{2}$. We use our aforementioned derandomization of the CountSketch data structure to obtain this algorithm, and using the time-space trade off of HashPRG, we show that the update time of this algorithm is also $O(\log 1/\varepsilon)$ in the Word RAM model.

Pseudorandom Hashing for Space-bounded Computation with Applications in Streaming

TL;DR

This work advances space-efficient streaming by introducing HashPRG, a symmetric, large-alphabet pseudorandom generator that offers a tunable seed-versus-update-time trade-off for derandomizing streaming algorithms. HashPRG enables near-optimal derandomizations for key primitives, including moment estimation (both and ), CountSketch with tight error guarantees, and Private CountSketch, while preserving space up to polylog factors and achieving fast per-update times in the Word RAM model. The authors develop a refined independence framework and leverage symmetry to reduce derandomization overhead, delivering practical, space-efficient, low-latency streaming algorithms. A core consequence is tighter upper and matching lower bounds for estimating in turnstile streams, along with a general-purpose derandomization toolkit (HashPRG) that parallels Nisan’s generator but with improved performance characteristics. Overall, HashPRG enables robust, provably efficient derandomizations across a broad spectrum of streaming tasks, with direct implications for efficient data analysis in turnstile streams.

Abstract

We revisit Nisan's classical pseudorandom generator (PRG) for space-bounded computation (STOC 1990) and its applications in streaming algorithms. We describe a new generator, HashPRG, that can be thought of as a symmetric version of Nisan's generator over larger alphabets. Our generator allows a trade-off between seed length and the time needed to compute a given block of the generator's output. HashPRG can be used to obtain derandomizations with much better update time and \emph{without sacrificing space} for a large number of data stream algorithms, such as estimation in the parameter regimes and and CountSketch with tight estimation guarantees as analyzed by Minton and Price (SODA 2014) which assumed access to a random oracle. We also show a recent analysis of Private CountSketch can be derandomized using our techniques. For a -dimensional vector being updated in a turnstile stream, we show that can be estimated up to an additive error of using bits of space. Additionally, the update time of this algorithm is in the Word RAM model. We show that the space complexity of this algorithm is optimal up to constant factors. However, for vectors with , we show that the lower bound can be broken by giving an algorithm that uses bits of space which approximates up to an additive error of . We use our aforementioned derandomization of the CountSketch data structure to obtain this algorithm, and using the time-space trade off of HashPRG, we show that the update time of this algorithm is also in the Word RAM model.
Paper Structure (45 sections, 33 theorems, 131 equations, 1 figure, 3 algorithms)

This paper contains 45 sections, 33 theorems, 131 equations, 1 figure, 3 algorithms.

Key Result

Theorem 1.1

There is a constant $c > 0$ such that for any positive integers $n$, $b$ and $k$ satisfying $b^{k} \le 2^{cn}$, there exists a pseudorandom generator parameterized by $n$, $b$ and $k$ that converts a random seed of length $O(bkn)$ bits to a bitstring of length $b^k \cdot n$ that cannot be distinguis

Figures (1)

  • Figure 1: Overview of CountSketch guarantees with different kinds of random hash functions. For simplicity we focus on the case of $r = O(\log d)$ repetitions and $d$-dimensional input vectors that contain $O(\log d)$-bit integers such that the CountSketch itself (without hash functions) uses space $D= O(t\log d)$ words. With pairwise independence we can only tightly bound the probability of exceeding error $\Delta = \|\text{tail}_{t}(x)\|_2 / \sqrt{t}$, while the other hash functions allow us to bound the probability of smaller errors. Time bounds are for implementation on a Word RAM with word size $w = O(\log d)$. Parameters with a particularly bad impact on space or time are highlighted in red color.

Theorems & Definitions (55)

  • Theorem 1.1: Informal
  • Theorem 1.2: Informal, Compare with nisan
  • Theorem 1.3
  • Theorem 1.4
  • Theorem 1.5: Informal
  • Theorem 1.6: Informal
  • Theorem 1.7: Informal
  • Theorem 1.8: Informal
  • Definition 2.1: $k$-wise independence
  • Theorem 2.2: Corollary 3 in CPT15
  • ...and 45 more