Equivalence Testing: The Power of Bounded Adaptivity

Diptarka Chakraborty; Sourav Chakraborty; Gunjan Kumar; Kuldeep S. Meel

Equivalence Testing: The Power of Bounded Adaptivity

Diptarka Chakraborty, Sourav Chakraborty, Gunjan Kumar, Kuldeep S. Meel

TL;DR

This work tackles equivalence testing of two unknown distributions on $[n]$ within the conditional sampling framework. It introduces a one-round adaptive tester that achieves $\tilde{O}(\log n)$ query complexity by constructing a small set of random subsets and using tuple-based certificates $(i,S)$ to compare conditional probabilities and tail behavior, enabling efficient discrimination between identical and $\varepsilon$-far distributions. The analysis relies on careful concentration bounds for the estimators and tail probabilities, proving correctness when $\mathcal{P}=\mathcal{Q}$ and when $d_{TV}(\mathcal{P},\mathcal{Q}) \ge \varepsilon$, while also outlining a fully adaptive path to $\tilde{O}(\log \log n)$ queries. Overall, the paper advances the practical parallelizability of distribution testing by showing that limited adaptivity (one round) can yield near-optimal logarithmic dependence on the domain size, albeit with heavier constants and stronger $\varepsilon$-dependence than some prior non-adaptive methods.

Abstract

Equivalence testing, a fundamental problem in the field of distribution testing, seeks to infer if two unknown distributions on $[n]$ are the same or far apart in the total variation distance. Conditional sampling has emerged as a powerful query model and has been investigated by theoreticians and practitioners alike, leading to the design of optimal algorithms albeit in a sequential setting (also referred to as adaptive tester). Given the profound impact of parallel computing over the past decades, there has been a strong desire to design algorithms that enable high parallelization. Despite significant algorithmic advancements over the last decade, parallelizable techniques (also termed non-adaptive testers) have $\tilde{O}(\log^{12}n)$ query complexity, a prohibitively large complexity to be of practical usage. Therefore, the primary challenge is whether it is possible to design algorithms that enable high parallelization while achieving efficient query complexity. Our work provides an affirmative answer to the aforementioned challenge: we present a highly parallelizable tester with a query complexity of $\tilde{O}(\log n)$, achieved through a single round of adaptivity, marking a significant stride towards harmonizing parallelizability and efficiency in equivalence testing.

Equivalence Testing: The Power of Bounded Adaptivity

TL;DR

This work tackles equivalence testing of two unknown distributions on

within the conditional sampling framework. It introduces a one-round adaptive tester that achieves

query complexity by constructing a small set of random subsets and using tuple-based certificates

to compare conditional probabilities and tail behavior, enabling efficient discrimination between identical and

-far distributions. The analysis relies on careful concentration bounds for the estimators and tail probabilities, proving correctness when

and when

, while also outlining a fully adaptive path to

queries. Overall, the paper advances the practical parallelizability of distribution testing by showing that limited adaptivity (one round) can yield near-optimal logarithmic dependence on the domain size, albeit with heavier constants and stronger

-dependence than some prior non-adaptive methods.

Abstract

Equivalence testing, a fundamental problem in the field of distribution testing, seeks to infer if two unknown distributions on

are the same or far apart in the total variation distance. Conditional sampling has emerged as a powerful query model and has been investigated by theoreticians and practitioners alike, leading to the design of optimal algorithms albeit in a sequential setting (also referred to as adaptive tester). Given the profound impact of parallel computing over the past decades, there has been a strong desire to design algorithms that enable high parallelization. Despite significant algorithmic advancements over the last decade, parallelizable techniques (also termed non-adaptive testers) have

query complexity, a prohibitively large complexity to be of practical usage. Therefore, the primary challenge is whether it is possible to design algorithms that enable high parallelization while achieving efficient query complexity. Our work provides an affirmative answer to the aforementioned challenge: we present a highly parallelizable tester with a query complexity of

, achieved through a single round of adaptivity, marking a significant stride towards harmonizing parallelizability and efficiency in equivalence testing.

Paper Structure (11 sections, 10 theorems, 36 equations, 3 algorithms)

This paper contains 11 sections, 10 theorems, 36 equations, 3 algorithms.

Introduction
Notations and Preliminaries
Related Work
An Efficient One-Round Adaptive Algorithm
High-Level Overview
Algorithm Description
Technical Analysis
Conclusion
Acknowledgements
MISSING PROOFS
An $\Tilde{O}(\log \log n)$-query fully adaptive algorithm

Key Result

Lemma 2.3

Let $X_1,\dots,X_m$ be $m$ iid random variables, each $X_i$ takes value in $\{0,1\}$ and $\mathbb{E}[X_i] = p$. Then for any $\gamma \in (0,1),$

Theorems & Definitions (27)

Definition 2.1: $\mathsf{COND}$ Query Model
Definition 2.2
Lemma 2.3: Additive Chernoff bound
Theorem 4.1
Lemma 4.2
Lemma 4.3
Lemma 4.4
proof
Lemma 4.5
proof
...and 17 more

Equivalence Testing: The Power of Bounded Adaptivity

TL;DR

Abstract

Equivalence Testing: The Power of Bounded Adaptivity

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (27)