Near-optimal algorithms for private estimation and sequential testing of collision probability

Robert Busa-Fekete; Umar Syed

Near-optimal algorithms for private estimation and sequential testing of collision probability

Robert Busa-Fekete, Umar Syed

TL;DR

This paper studies private estimation and sequential testing of collision probability $C(\mathbf{p})=\sum_i p_i^2$ for discrete distributions. It introduces a non-interactive locally private estimator that uses salted hashing to count hash collisions across $\Theta(n^2)$ pairs, achieving near-optimal sample complexity $\tilde{O}\left(\frac{\log(1/\beta)}{\alpha^2 \varepsilon^2}\right)$ for $\alpha\le 1$, improving previous bounds by a factor $1/\alpha^2$. It also develops a sequential testing algorithm that distinguishes $C(\mathbf{p})=c_0$ from $|C(\mathbf{p})-c_0|\ge\varepsilon$ with $\tilde{O}\left(\frac{1}{\varepsilon^2}\right)$ samples, adapting automatically to unknown $\varepsilon$; a private sequential tester variant (PSQ) combines these ideas under privacy constraints. The work provides matching lower bounds (up to logarithmic factors) and demonstrates substantial practical sample reductions in experiments compared to prior methods. Overall, the approaches directly exploit the $\Theta(n^2)$ potential collisions in $n$ samples to improve estimation and testing efficiency under local privacy.

Abstract

We present new algorithms for estimating and testing \emph{collision probability}, a fundamental measure of the spread of a discrete distribution that is widely used in many scientific fields. We describe an algorithm that satisfies $(α, β)$-local differential privacy and estimates collision probability with error at most $ε$ using $\tilde{O}\left(\frac{\log(1/β)}{α^2 ε^2}\right)$ samples for $α\le 1$, which improves over previous work by a factor of $\frac{1}{α^2}$. We also present a sequential testing algorithm for collision probability, which can distinguish between collision probability values that are separated by $ε$ using $\tilde{O}(\frac{1}{ε^2})$ samples, even when $ε$ is unknown. Our algorithms have nearly the optimal sample complexity, and in experiments we show that they require significantly fewer samples than previous methods.

Near-optimal algorithms for private estimation and sequential testing of collision probability

TL;DR

This paper studies private estimation and sequential testing of collision probability

for discrete distributions. It introduces a non-interactive locally private estimator that uses salted hashing to count hash collisions across

pairs, achieving near-optimal sample complexity

for

, improving previous bounds by a factor

. It also develops a sequential testing algorithm that distinguishes

from

with

samples, adapting automatically to unknown

; a private sequential tester variant (PSQ) combines these ideas under privacy constraints. The work provides matching lower bounds (up to logarithmic factors) and demonstrates substantial practical sample reductions in experiments compared to prior methods. Overall, the approaches directly exploit the

potential collisions in

samples to improve estimation and testing efficiency under local privacy.

Abstract

-local differential privacy and estimates collision probability with error at most

using

samples for

, which improves over previous work by a factor of

. We also present a sequential testing algorithm for collision probability, which can distinguish between collision probability values that are separated by

using

samples, even when

is unknown. Our algorithms have nearly the optimal sample complexity, and in experiments we show that they require significantly fewer samples than previous methods.

Near-optimal algorithms for private estimation and sequential testing of collision probability

TL;DR

Abstract

Near-optimal algorithms for private estimation and sequential testing of collision probability

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (27)