Table of Contents
Fetching ...

Breaking the Quadratic Barrier: Robust Cardinality Sketches for Adaptive Queries

Edith Cohen, Mihir Singhal, Uri Stemmer

TL;DR

This paper addresses the robustness of cardinality sketches under adaptive queries, where prior work exhibits a quadratic barrier in the number of queries $t$ relative to the sketch size $k$. By reframing robustness through an adaptive data analysis (ADA) lens and introducing fine-grained per-key participation control via the parameter $r$, the authors design robust estimators for the bottom-$k$ sketch that can handle an exponential number of adaptive queries as long as each key participates in at most $r= ilde{O}(k^2)$ sketches. They present two estimators: a Basic Robust Estimator and a Tracking estimator, both grounded in a refined ADA framework and a reinterpretation of the Hassidim wrapper; tracking additionally deactivates overexposed keys to preserve accuracy. Empirical results on Uniform and Pareto query patterns demonstrate large practical gains, illustrating the approach's potential to broaden the toolkit for robust, composable sketching in adaptive environments.

Abstract

Cardinality sketches are compact data structures that efficiently estimate the number of distinct elements across multiple queries while minimizing storage, communication, and computational costs. However, recent research has shown that these sketches can fail under {\em adaptively chosen queries}, breaking down after approximately $\tilde{O}(k^2)$ queries, where $k$ is the sketch size. In this work, we overcome this \emph{quadratic barrier} by designing robust estimators with fine-grained guarantees. Specifically, our constructions can handle an {\em exponential number of adaptive queries}, provided that each element participates in at most $\tilde{O}(k^2)$ queries. This effectively shifts the quadratic barrier from the total number of queries to the number of queries {\em sharing the same element}, which can be significantly smaller. Beyond cardinality sketches, our approach expands the toolkit for robust algorithm design.

Breaking the Quadratic Barrier: Robust Cardinality Sketches for Adaptive Queries

TL;DR

This paper addresses the robustness of cardinality sketches under adaptive queries, where prior work exhibits a quadratic barrier in the number of queries relative to the sketch size . By reframing robustness through an adaptive data analysis (ADA) lens and introducing fine-grained per-key participation control via the parameter , the authors design robust estimators for the bottom- sketch that can handle an exponential number of adaptive queries as long as each key participates in at most sketches. They present two estimators: a Basic Robust Estimator and a Tracking estimator, both grounded in a refined ADA framework and a reinterpretation of the Hassidim wrapper; tracking additionally deactivates overexposed keys to preserve accuracy. Empirical results on Uniform and Pareto query patterns demonstrate large practical gains, illustrating the approach's potential to broaden the toolkit for robust, composable sketching in adaptive environments.

Abstract

Cardinality sketches are compact data structures that efficiently estimate the number of distinct elements across multiple queries while minimizing storage, communication, and computational costs. However, recent research has shown that these sketches can fail under {\em adaptively chosen queries}, breaking down after approximately queries, where is the sketch size. In this work, we overcome this \emph{quadratic barrier} by designing robust estimators with fine-grained guarantees. Specifically, our constructions can handle an {\em exponential number of adaptive queries}, provided that each element participates in at most queries. This effectively shifts the quadratic barrier from the total number of queries to the number of queries {\em sharing the same element}, which can be significantly smaller. Beyond cardinality sketches, our approach expands the toolkit for robust algorithm design.

Paper Structure

This paper contains 18 sections, 14 theorems, 23 equations, 2 figures.

Key Result

Theorem 2.1

For any $\varepsilon < 1$ and $\delta \in (0, 1)$, Algorithm algo:svt-individual is $(O(\sqrt{r \log(1/\delta)}\varepsilon), 2^{-\Omega(r)} + \delta)$-DP (see thm:TCprivacy for more precise expressions).

Figures (2)

  • Figure 1: Number of guaranteed queries for sketch size $k$. The gain factor of TRobustEst over baseline is over two orders of magnitude with the Uniform distribution, $40\times$ for Pareto with $\alpha=2$, and $12\times$ for Pareto with $\alpha=1.5$.
  • Figure 2: ${\mathcal{B}}$

Theorems & Definitions (27)

  • Theorem 2.1: targetcharging:ICML2023 Privacy of Algorithm \ref{['algo:svt-individual']}
  • Theorem 2.1: Generalization property of DP DworkFHPRR15BassilyNSSSU:sicomp2021FeldmanS17
  • Lemma 3.1: Generalization and sampling error bound
  • proof
  • Claim 3.2: Noise and deactivation error bounds
  • proof
  • Corollary 3.3
  • Theorem 4.1: Basic robust estimator guarantee
  • Lemma 4.2
  • proof
  • ...and 17 more