Equivalence Testing: The Power of Bounded Adaptivity
Diptarka Chakraborty, Sourav Chakraborty, Gunjan Kumar, Kuldeep S. Meel
TL;DR
This work tackles equivalence testing of two unknown distributions on $[n]$ within the conditional sampling framework. It introduces a one-round adaptive tester that achieves $\tilde{O}(\log n)$ query complexity by constructing a small set of random subsets and using tuple-based certificates $(i,S)$ to compare conditional probabilities and tail behavior, enabling efficient discrimination between identical and $\varepsilon$-far distributions. The analysis relies on careful concentration bounds for the estimators and tail probabilities, proving correctness when $\mathcal{P}=\mathcal{Q}$ and when $d_{TV}(\mathcal{P},\mathcal{Q}) \ge \varepsilon$, while also outlining a fully adaptive path to $\tilde{O}(\log \log n)$ queries. Overall, the paper advances the practical parallelizability of distribution testing by showing that limited adaptivity (one round) can yield near-optimal logarithmic dependence on the domain size, albeit with heavier constants and stronger $\varepsilon$-dependence than some prior non-adaptive methods.
Abstract
Equivalence testing, a fundamental problem in the field of distribution testing, seeks to infer if two unknown distributions on $[n]$ are the same or far apart in the total variation distance. Conditional sampling has emerged as a powerful query model and has been investigated by theoreticians and practitioners alike, leading to the design of optimal algorithms albeit in a sequential setting (also referred to as adaptive tester). Given the profound impact of parallel computing over the past decades, there has been a strong desire to design algorithms that enable high parallelization. Despite significant algorithmic advancements over the last decade, parallelizable techniques (also termed non-adaptive testers) have $\tilde{O}(\log^{12}n)$ query complexity, a prohibitively large complexity to be of practical usage. Therefore, the primary challenge is whether it is possible to design algorithms that enable high parallelization while achieving efficient query complexity. Our work provides an affirmative answer to the aforementioned challenge: we present a highly parallelizable tester with a query complexity of $\tilde{O}(\log n)$, achieved through a single round of adaptivity, marking a significant stride towards harmonizing parallelizability and efficiency in equivalence testing.
