Better Private Distribution Testing by Leveraging Unverified Auxiliary Data
Maryam Aliakbarpour, Arnav Burudgunte, Clément Cannone, Ronitt Rubinfeld
TL;DR
This work extends the augmented testing framework to the differentially private setting, enabling hypothesis testing on sensitive data with untrusted public auxiliary information. It delivers private algorithms for identity (and uniformity) testing and for closeness testing that adapt to the quality of the auxiliary advice $\hat{p}$, achieving sample complexities that interpolate between standard DP testers and oracle-like augmented testers. The core technical contributions are two-step privatized flattening and a private $\ell_2$-norm verification, leveraging ADKR19 mappings and Laplace noise to achieve DP while preserving testing power. Complementary information-theoretic lower bounds show that the proposed algorithms are nearly optimal (up to logs) across regimes, highlighting when high-quality auxiliary data yields substantial privacy savings and when it cannot help. The results advance private inference with unverified public data, with potential impact on privacy-preserving data analysis in settings rich with auxiliary information.
Abstract
We extend the framework of augmented distribution testing (Aliakbarpour, Indyk, Rubinfeld, and Silwal, NeurIPS 2024) to the differentially private setting. This captures scenarios where a data analyst must perform hypothesis testing tasks on sensitive data, but is able to leverage prior knowledge (public, but possibly erroneous or untrusted) about the data distribution. We design private algorithms in this augmented setting for three flagship distribution testing tasks, uniformity, identity, and closeness testing, whose sample complexity smoothly scales with the claimed quality of the auxiliary information. We complement our algorithms with information-theoretic lower bounds, showing that their sample complexity is optimal (up to logarithmic factors).
