Interactive Proofs For Distribution Testing With Conditional Oracles
Ari Biswas, Mark Bun, Clément Canonne, Satchit Sivakumar
TL;DR
The paper tackles the high sample cost in distribution property testing by introducing a polylogarithmic-query, polylogarithmic-sample, sublinear-communication interactive proof framework that uses pairwise conditional (PCond) queries. Central to the approach is the unlabelled bucket histogram as a succinct statistic for label-invariant properties, augmented by learning bucket masses via pairwise comparisons and using neighborhood/moat concepts to certify point probabilities and support sizes. The authors provide both lower bounds showing PCond-query limitations and constructive protocols enabling tolerant verification of label-invariant properties with near-optimal communication, thus achieving exponential savings over plain sampling. Together, these results advance practical verification of distribution properties on large domains and establish a foundation for efficient, conditional-access verified testing. The work also surveys related interactive-proof/conditional-sampling literature and proposes several open directions for tightening complexity and enabling broader composability.
Abstract
We revisit the framework of interactive proofs for distribution testing, first introduced by Chiesa and Gur (ITCS 2018), which has recently experienced a surge in interest, accompanied by notable progress (e.g., Herman and Rothblum, STOC 2022, FOCS 2023; Herman, RANDOM~2024). In this model, a data-poor verifier determines whether a probability distribution has a property of interest by interacting with an all-powerful, data-rich but untrusted prover bent on convincing them that it has the property. While prior work gave sample-, time-, and communication-efficient protocols for testing and estimating a range of distribution properties, they all suffer from an inherent issue: for most interesting properties of distributions over a domain of size $N$, the verifier must draw at least $Ω(\sqrt{N})$ samples of its own. While sublinear in $N$, this is still prohibitive for large domains encountered in practice. In this work, we circumvent this limitation by augmenting the verifier with the ability to perform an exponentially smaller number of more powerful (but reasonable) \emph{pairwise conditional} queries, effectively enabling them to perform ``local comparison checks'' of the prover's claims. We systematically investigate the landscape of interactive proofs in this new setting, giving polylogarithmic query and sample protocols for (tolerantly) testing all \emph{label-invariant} properties, thus demonstrating exponential savings without compromising on communication, for this large and fundamental class of testing tasks.
