Distribution-Free Proofs of Proximity
Hugo Aaronson, Tom Gur, Ninad Rajgopal, Ron D. Rothblum
TL;DR
This paper studies distribution-free property testing augmented with interactive proofs of proximity (df-IPPs), addressing the challenge of unknown input distributions by leveraging an untrusted prover. It proves that every language in NC admits a df-IPP with favorable trade-offs between queries, samples, and communication for non-negligible proximity parameters, and shows near-optimal performance in several regimes. It develops a reduction to low-degree-extended polynomial testing (PVAL) and introduces distance-preservation lemmas and polynomial folding techniques to handle distributional uncertainty, including special distribution families like ρ-dispersed and product distributions. The results reveal both the power and limits of df-IPPs, including separations from distribution-free testers and implications for symmetric and RLCC languages, thereby enriching the landscape of sublinear verification and delegation of computation. The work unifies interactive proofs, distribution testing, and property testing to enable efficient verification under arbitrary data-generating environments with sublinear resources.
Abstract
Motivated by the fact that input distributions are often unknown in advance, distribution-free property testing considers a setting where the algorithmic task is to accept functions $f : [n] \to \{0,1\}$ with a certain property P and reject functions that are $η$-far from P, where the distance is measured according to an arbitrary and unknown input distribution $D \sim [n]$. As usual in property testing, the tester can only make a sublinear number of input queries, but as the distribution is unknown, we also allow a sublinear number of samples from the distribution D. In this work we initiate the study of distribution-free interactive proofs of proximity (df-IPPs) in which the distribution-free testing algorithm is assisted by an all powerful but untrusted prover. Our main result is that for any problem P $\in$ NC, any proximity parameter $η> 0$, and any (trade-off) parameter $t\leq\sqrt{n}$, we construct a df-IPP for P with respect to $η$, that has query and sample complexities $t+O(1/η)$, and communication complexity $\tilde{O}(n/t + 1/η)$. For t as above and sufficiently large $η$ (namely, when $η> t/n$), this result matches the parameters of the best-known general purpose IPPs in the standard uniform setting. Moreover, for such t, its parameters are optimal up to poly-logarithmic factors under reasonable cryptographic assumptions for the same regime of $η$ as the uniform setting, i.e., when $η\geq 1/t$. For small $η$ (i.e., $η< t/n$), our protocol has communication complexity $Ω(1/η)$, which is worse than the $\tilde{O}(n/t)$ communication complexity of the uniform IPPs (with the same query complexity). To improve on this gap, we show that for IPPs over specialised, but large distribution families, such as sufficiently smooth distributions and product distributions, the communication complexity reduces to $\tilde{O}(n/t^{1-o(1)})$.
