Distribution-Free Proofs of Proximity

Hugo Aaronson; Tom Gur; Ninad Rajgopal; Ron D. Rothblum

Distribution-Free Proofs of Proximity

Hugo Aaronson, Tom Gur, Ninad Rajgopal, Ron D. Rothblum

TL;DR

This paper studies distribution-free property testing augmented with interactive proofs of proximity (df-IPPs), addressing the challenge of unknown input distributions by leveraging an untrusted prover. It proves that every language in NC admits a df-IPP with favorable trade-offs between queries, samples, and communication for non-negligible proximity parameters, and shows near-optimal performance in several regimes. It develops a reduction to low-degree-extended polynomial testing (PVAL) and introduces distance-preservation lemmas and polynomial folding techniques to handle distributional uncertainty, including special distribution families like ρ-dispersed and product distributions. The results reveal both the power and limits of df-IPPs, including separations from distribution-free testers and implications for symmetric and RLCC languages, thereby enriching the landscape of sublinear verification and delegation of computation. The work unifies interactive proofs, distribution testing, and property testing to enable efficient verification under arbitrary data-generating environments with sublinear resources.

Abstract

Motivated by the fact that input distributions are often unknown in advance, distribution-free property testing considers a setting where the algorithmic task is to accept functions $f : [n] \to \{0,1\}$ with a certain property P and reject functions that are $η$-far from P, where the distance is measured according to an arbitrary and unknown input distribution $D \sim [n]$. As usual in property testing, the tester can only make a sublinear number of input queries, but as the distribution is unknown, we also allow a sublinear number of samples from the distribution D. In this work we initiate the study of distribution-free interactive proofs of proximity (df-IPPs) in which the distribution-free testing algorithm is assisted by an all powerful but untrusted prover. Our main result is that for any problem P $\in$ NC, any proximity parameter $η> 0$, and any (trade-off) parameter $t\leq\sqrt{n}$, we construct a df-IPP for P with respect to $η$, that has query and sample complexities $t+O(1/η)$, and communication complexity $\tilde{O}(n/t + 1/η)$. For t as above and sufficiently large $η$ (namely, when $η> t/n$), this result matches the parameters of the best-known general purpose IPPs in the standard uniform setting. Moreover, for such t, its parameters are optimal up to poly-logarithmic factors under reasonable cryptographic assumptions for the same regime of $η$ as the uniform setting, i.e., when $η\geq 1/t$. For small $η$ (i.e., $η< t/n$), our protocol has communication complexity $Ω(1/η)$, which is worse than the $\tilde{O}(n/t)$ communication complexity of the uniform IPPs (with the same query complexity). To improve on this gap, we show that for IPPs over specialised, but large distribution families, such as sufficiently smooth distributions and product distributions, the communication complexity reduces to $\tilde{O}(n/t^{1-o(1)})$.

Distribution-Free Proofs of Proximity

TL;DR

Abstract

Motivated by the fact that input distributions are often unknown in advance, distribution-free property testing considers a setting where the algorithmic task is to accept functions

with a certain property P and reject functions that are

-far from P, where the distance is measured according to an arbitrary and unknown input distribution

. As usual in property testing, the tester can only make a sublinear number of input queries, but as the distribution is unknown, we also allow a sublinear number of samples from the distribution D. In this work we initiate the study of distribution-free interactive proofs of proximity (df-IPPs) in which the distribution-free testing algorithm is assisted by an all powerful but untrusted prover. Our main result is that for any problem P

NC, any proximity parameter

, and any (trade-off) parameter

, we construct a df-IPP for P with respect to

, that has query and sample complexities

, and communication complexity

. For t as above and sufficiently large

(namely, when

), this result matches the parameters of the best-known general purpose IPPs in the standard uniform setting. Moreover, for such t, its parameters are optimal up to poly-logarithmic factors under reasonable cryptographic assumptions for the same regime of

as the uniform setting, i.e., when

. For small

(i.e.,

), our protocol has communication complexity

, which is worse than the

communication complexity of the uniform IPPs (with the same query complexity). To improve on this gap, we show that for IPPs over specialised, but large distribution families, such as sufficiently smooth distributions and product distributions, the communication complexity reduces to

Paper Structure (49 sections, 43 theorems, 87 equations, 3 figures, 1 table, 9 algorithms)

This paper contains 49 sections, 43 theorems, 87 equations, 3 figures, 1 table, 9 algorithms.

Introduction
Distribution-free Interactive Proofs of Proximity
Our Results
Distribution-free $\mathsf{IPP}$s for $\mathsf{NC}$
$\mathsf{IPP}$s for $\mathsf{NC}$: The case of small $\varepsilon$
Product Distributions in the White-Box model:
On the power of distribution-free $\mathsf{IPP}$s
Symmetric languages.
(Relaxed) self-correctable languages.
Technical Overview
Proof outline of Theorem \ref{['thm:informal_dfipp_nc']}
Proof outlines of Theorems \ref{['thm:informal_ipp_dispersed']} and \ref{['thm:informal_product_dfipp']}
Uniform Distance Preservation Lemma.
$\rho$-dispersed distributions.
Related Work
...and 34 more sections

Key Result

Theorem 1.1

For every language $L$ in logspace-uniform $\mathsf{NC}$ and every trade-off parameter $\tau=\tau(n) \leq \sqrt{n}$, there exists a distribution-free $\mathsf{IPP}$ for $L$ with proximity parameter $\varepsilon\geq \Omega\left(\frac{\log^3 (n)}{n}\right)$, query complexity $\tau+O\left(\frac{1}{\var

Figures (3)

Figure 1: The shaded region ($B_{\mathcal{U}}(X) \cap B_{\mathcal{D}}(X)$) consists of the set of points in $\{0,1\}^n$ that are $\varepsilon$-close to $X$ with respect to both $\mathcal{D}$ and $\mathcal{U}$. The soundness promise of the interactive reduction $\Pi'$ ensures that any string in $\mathsf{PVAL}(J,\Vec{v})$ is present in at most one of $B_{\mathcal{U}}(X)$ or $B_{\mathcal{D}}(X)$, but not in both (shaded region) (with high probability).
Figure 2: In the uniform $\mathsf{IPP}$ for $\mathsf{PVAL}$, the prover sends the $(m-1)$-variate $\mathsf{LDE}$ of each row of X evaluated on $J_2$ (column indices of $J$), in the form of the purported matrix $Y' \in \mathbb{F}^{k \times t}$. However, to ensure consistency of $Y'$ with respect to $\mathsf{PVAL}(J,\Vec{v})$, for any $j = (a,b) \in J$, the univariate $\mathsf{LDE}$ of the $b^{\text{th}}$-column of $Y'$ evaluated on $a$ is required to be equal to $\Vec{v}[j]$.
Figure 3: During the polynomial folding protocol, the prover sends the univariate $\mathsf{LDE}$ of each row of X evaluated on the columns of $J$, collected in the matrix $Y \in \mathbb{F}^{k_1 \times t}$. For any $j = (j_1,j_2) \in J$, the univariate $\mathsf{LDE}$ of the $j_2^{\text{th}}$-column of $Y$ restricted to $j_1$ is equal to $\Vec{v}[j]$.

Theorems & Definitions (111)

Theorem 1.1: Distribution-Free $\mathsf{IPP}$ for $\mathsf{NC}$
Remark 1
Theorem 1.2: $\mathsf{IPP}$ for $\mathsf{NC}$ over $\rho$-dispersed distributions
Theorem 1.3: $\mathsf{IPP}$s for $\mathsf{NC}$ over $m$-product distributions
Theorem 1.4: Distribution-free $\mathsf{IPP}$s for symmetric languages
Proposition 1.5: Generic Transformations for $\mathsf{IPP}$s for RLCCs
Corollary 1.6: Complexity separations
Proposition 1.7: Distribution-free $\mathsf{IPP}$s vs. uniform testing
Definition 2.1: Hybrid Metrics
Remark 2
...and 101 more

Distribution-Free Proofs of Proximity

TL;DR

Abstract

Distribution-Free Proofs of Proximity

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (111)