Table of Contents
Fetching ...

Adversarially Robust Dense-Sparse Tradeoffs via Heavy-Hitters

David P. Woodruff, Samson Zhou

TL;DR

This work studies adversarially robust streaming algorithms for $L_p$ estimation on turnstile streams, addressing adaptivity-induced failures. It introduces a heavy-hitter component (RobustHH) that blends a deterministic heavy-hitter routine for small universes with a robust CountSketch variant for large universes, guided by an $L_0$ estimator, along with a residual-tail estimator (ResidualEst) whose additive guarantees depend only on the tail, not its size $k$. The combination yields an improved adversarially robust $L_p$ estimator with space $\tilde{O}\left(m^c\right)$ for $p\in(1,2)$ and $c<\frac{p}{2p+1}$, plus a special-case heavy-hitter bound and a residual-focused estimation framework with space poly$(1/\varepsilon,\log n)$. Empirically, on the CAIDA dataset, the residual-based approach achieves substantially smaller flip numbers and practical space savings, demonstrating robustness and scalability under adaptive inputs.

Abstract

In the adversarial streaming model, the input is a sequence of adaptive updates that defines an underlying dataset and the goal is to approximate, collect, or compute some statistic while using space sublinear in the size of the dataset. In 2022, Ben-Eliezer, Eden, and Onak showed a dense-sparse trade-off technique that elegantly combined sparse recovery with known techniques using differential privacy and sketch switching to achieve adversarially robust algorithms for $L_p$ estimation and other algorithms on turnstile streams. In this work, we first give an improved algorithm for adversarially robust $L_p$-heavy hitters, utilizing deterministic turnstile heavy-hitter algorithms with better tradeoffs. We then utilize our heavy-hitter algorithm to reduce the problem to estimating the frequency moment of the tail vector. We give a new algorithm for this problem in the classical streaming setting, which achieves additive error and uses space independent in the size of the tail. We then leverage these ingredients to give an improved algorithm for adversarially robust $L_p$ estimation on turnstile streams.

Adversarially Robust Dense-Sparse Tradeoffs via Heavy-Hitters

TL;DR

This work studies adversarially robust streaming algorithms for estimation on turnstile streams, addressing adaptivity-induced failures. It introduces a heavy-hitter component (RobustHH) that blends a deterministic heavy-hitter routine for small universes with a robust CountSketch variant for large universes, guided by an estimator, along with a residual-tail estimator (ResidualEst) whose additive guarantees depend only on the tail, not its size . The combination yields an improved adversarially robust estimator with space for and , plus a special-case heavy-hitter bound and a residual-focused estimation framework with space poly. Empirically, on the CAIDA dataset, the residual-based approach achieves substantially smaller flip numbers and practical space savings, demonstrating robustness and scalability under adaptive inputs.

Abstract

In the adversarial streaming model, the input is a sequence of adaptive updates that defines an underlying dataset and the goal is to approximate, collect, or compute some statistic while using space sublinear in the size of the dataset. In 2022, Ben-Eliezer, Eden, and Onak showed a dense-sparse trade-off technique that elegantly combined sparse recovery with known techniques using differential privacy and sketch switching to achieve adversarially robust algorithms for estimation and other algorithms on turnstile streams. In this work, we first give an improved algorithm for adversarially robust -heavy hitters, utilizing deterministic turnstile heavy-hitter algorithms with better tradeoffs. We then utilize our heavy-hitter algorithm to reduce the problem to estimating the frequency moment of the tail vector. We give a new algorithm for this problem in the classical streaming setting, which achieves additive error and uses space independent in the size of the tail. We then leverage these ingredients to give an improved algorithm for adversarially robust estimation on turnstile streams.

Paper Structure

This paper contains 20 sections, 26 theorems, 23 equations, 1 figure, 4 algorithms.

Key Result

Theorem 1.2

Let $p\in[1,2]$. There exists an algorithm that uses $\tilde{\mathcal{O}}\left(\frac{1}{\varepsilon^{2.5}}m^{(2p-2)/(4p-3)}\right)$ bits of space and solves the $\varepsilon$-$L_p$-heavy hitters problem at all times in an adversarial stream of length $m$.

Figures (1)

  • Figure 1: Empirical evaluations on the CAIDA dataset, comparing flip number of the $p$-th frequency moment and the residual, for $\varepsilon=\alpha=0.001$ and $p=1.5$ when not variable. Smaller flip numbers indicate less space needed by the algorithm.

Theorems & Definitions (43)

  • Definition 1.1: $\varepsilon$-$L_p$-heavy hitters
  • Theorem 1.2
  • Theorem 1.3
  • Definition 1.4: Differential privacy
  • Theorem 1.5: Private median, e.g., HassidimKMMS20
  • Theorem 1.6: Advanced composition, e.g., DworkRV10
  • Theorem 1.7: Generalization of DP, e.g., DworkFHPRR15BassilyNSSSU21
  • Theorem 1.8
  • Theorem 2.1
  • Theorem 2.2
  • ...and 33 more