Table of Contents
Fetching ...

Differentially Private Permutation Tests: Applications to Kernel Methods

Ilmun Kim, Antonin Schrab

TL;DR

The paper tackles private hypothesis testing by introducing differentially private permutation tests that preserve finite-sample validity under $(\varepsilon,\delta)$-DP. It develops a refined privatization via a quantile representation, enabling finite-sample control of type I error while achieving meaningful power guarantees. The authors instantiate the framework with kernel-based statistics, deriving dpMMD and dpHSIC, and establish minimax optimal separation rates across privacy regimes, along with negative results for U-statistic based private tests. Extensive simulations on synthetic data and CelebA images demonstrate strong empirical performance and practical viability, with open-source code to enable adoption. Overall, the work provides a principled, scalable approach to privacy-preserving hypothesis testing with strong theoretical guarantees and practical kernel-based tools.

Abstract

Recent years have witnessed growing concerns about the privacy of sensitive data. In response to these concerns, differential privacy has emerged as a rigorous framework for privacy protection, gaining widespread recognition in both academic and industrial circles. While substantial progress has been made in private data analysis, existing methods often suffer from impracticality or a significant loss of statistical efficiency. This paper aims to alleviate these concerns in the context of hypothesis testing by introducing differentially private permutation tests. The proposed framework extends classical non-private permutation tests to private settings, maintaining both finite-sample validity and differential privacy in a rigorous manner. The power of the proposed test depends on the choice of a test statistic, and we establish general conditions for consistency and non-asymptotic uniform power. To demonstrate the utility and practicality of our framework, we focus on reproducing kernel-based test statistics and introduce differentially private kernel tests for two-sample and independence testing: dpMMD and dpHSIC. The proposed kernel tests are straightforward to implement, applicable to various types of data, and attain minimax optimal power across different privacy regimes. Our empirical evaluations further highlight their competitive power under various synthetic and real-world scenarios, emphasizing their practical value. The code is publicly available to facilitate the implementation of our framework.

Differentially Private Permutation Tests: Applications to Kernel Methods

TL;DR

The paper tackles private hypothesis testing by introducing differentially private permutation tests that preserve finite-sample validity under -DP. It develops a refined privatization via a quantile representation, enabling finite-sample control of type I error while achieving meaningful power guarantees. The authors instantiate the framework with kernel-based statistics, deriving dpMMD and dpHSIC, and establish minimax optimal separation rates across privacy regimes, along with negative results for U-statistic based private tests. Extensive simulations on synthetic data and CelebA images demonstrate strong empirical performance and practical viability, with open-source code to enable adoption. Overall, the work provides a principled, scalable approach to privacy-preserving hypothesis testing with strong theoretical guarantees and practical kernel-based tools.

Abstract

Recent years have witnessed growing concerns about the privacy of sensitive data. In response to these concerns, differential privacy has emerged as a rigorous framework for privacy protection, gaining widespread recognition in both academic and industrial circles. While substantial progress has been made in private data analysis, existing methods often suffer from impracticality or a significant loss of statistical efficiency. This paper aims to alleviate these concerns in the context of hypothesis testing by introducing differentially private permutation tests. The proposed framework extends classical non-private permutation tests to private settings, maintaining both finite-sample validity and differential privacy in a rigorous manner. The power of the proposed test depends on the choice of a test statistic, and we establish general conditions for consistency and non-asymptotic uniform power. To demonstrate the utility and practicality of our framework, we focus on reproducing kernel-based test statistics and introduce differentially private kernel tests for two-sample and independence testing: dpMMD and dpHSIC. The proposed kernel tests are straightforward to implement, applicable to various types of data, and attain minimax optimal power across different privacy regimes. Our empirical evaluations further highlight their competitive power under various synthetic and real-world scenarios, emphasizing their practical value. The code is publicly available to facilitate the implementation of our framework.
Paper Structure (103 sections, 39 theorems, 382 equations, 16 figures, 3 algorithms)

This paper contains 103 sections, 39 theorems, 382 equations, 16 figures, 3 algorithms.

Key Result

Lemma 1

Suppose that an algorithm $\mathcal{A}$ is $(\varepsilon,\delta)$-differentially private. Then for an arbitrary randomized function $f$, the composition $f \circ \mathcal{A}$ also preserves $(\varepsilon,\delta)$-differentially privacy.

Figures (16)

  • Figure 1: Perturbed uniform $d$-dimensional densities on $[0,1]^d$ with varying perturbation amplitude $a$.
  • Figure 2: Comparing uniform vs. perturbed uniform while varying the privacy level $\varepsilon$. We set the sample sizes $m = n = 3000$ and dimension $d=1$, and change the privacy level $\varepsilon$ and perturbation amplitude $a$ as follows: (Left) Privacy level $\varepsilon$ from $1/n$ to $10/\sqrt{n}$, perturbation amplitude $a=0.2$. (Middle) Privacy level $\varepsilon$ from $10/\sqrt{n}$ to $1$, perturbation amplitude $a=0.15$. (Right) Privacy level $\varepsilon$ from $1$ to $\sqrt{n}$, perturbation amplitude $a=0.1$.
  • Figure 3: Comparing uniform vs. perturbed uniform while varying the sample sizes $m=n$. We set the dimension $d=1$ and perturbation amplitude $a=0.1$. We change the privacy level as follows: (Left) Privacy level $\varepsilon=10/\sqrt{n}$. (Middle) Privacy level $\varepsilon=1$. (Right) Privacy level $\varepsilon=\sqrt{n}/10$.
  • Figure 4: Comparing uniform vs. perturbed uniform while varying the dimension $d$. We set the sample sizes $m = n = 3000$ and perturbation amplitude $a=0.2$. We change the privacy level as follows: (Left) Privacy level $\varepsilon=10/\sqrt{n}$. (Middle) Privacy level $\varepsilon=1$. (Right) Privacy level $\varepsilon=\sqrt{n}/10$.
  • Figure 5: Selected https://mmlab.ie.cuhk.edu.hk/projects/CelebA.html images in dimension $3\times 178\times 218$.
  • ...and 11 more figures

Theorems & Definitions (60)

  • Definition 1: Differential Privacy
  • Lemma 1: Post-Processing
  • Lemma 2: Composition
  • Lemma 3: Group Privacy
  • Definition 2: Global $\ell_p$-Sensitivity
  • Definition 3: Laplace Mechanism
  • Lemma 4: Differential Privacy of Laplace Mechanism
  • Remark 1: Gaussian Mechanism
  • Example 1: Sensitivity of Integral Probability Metric
  • Theorem 1: Validity Guarantee
  • ...and 50 more