Table of Contents
Fetching ...

Locally Differentially Private Two-Sample Testing

Alexander Kent, Thomas B. Berrett, Yi Yu

Abstract

We consider the problem of two-sample testing under a local differential privacy constraint where a permutation procedure is used to calibrate the tests. We develop testing procedures which are optimal up to logarithmic factors, for general discrete distributions and continuous distributions subject to a smoothness constraint. Both non-interactive and interactive tests are considered, and we show allowing interactivity results in an improvement in the minimax separation rates. Our results show that permutation procedures remain feasible in practice under local privacy constraints, despite the inability to permute the non-private data directly and only the private views. Further, through a refined theoretical analysis of the permutation procedure, we are able to avoid an equal sample size assumption which has been made in the permutation testing literature regardless of the presence of the privacy constraint. Lastly, we conduct numerical experiments which demonstrate the performance of our proposed test and verify the theoretical findings, especially the improved performance enabled by allowing interactivity.

Locally Differentially Private Two-Sample Testing

Abstract

We consider the problem of two-sample testing under a local differential privacy constraint where a permutation procedure is used to calibrate the tests. We develop testing procedures which are optimal up to logarithmic factors, for general discrete distributions and continuous distributions subject to a smoothness constraint. Both non-interactive and interactive tests are considered, and we show allowing interactivity results in an improvement in the minimax separation rates. Our results show that permutation procedures remain feasible in practice under local privacy constraints, despite the inability to permute the non-private data directly and only the private views. Further, through a refined theoretical analysis of the permutation procedure, we are able to avoid an equal sample size assumption which has been made in the permutation testing literature regardless of the presence of the privacy constraint. Lastly, we conduct numerical experiments which demonstrate the performance of our proposed test and verify the theoretical findings, especially the improved performance enabled by allowing interactivity.

Paper Structure

This paper contains 59 sections, 30 theorems, 253 equations, 7 figures, 1 table.

Key Result

Proposition 1

Fix $n \in \mathbb{N}$ and let $\{Z_1, \hdots, Z_n\} \subset \mathcal{Z}^n$ be $\varepsilon$-LDP views of data $\{X_1, \hdots, X_n\} \subset \mathcal{X}^n$. Let $f : \mathcal{Z}^{n} \rightarrow \mathcal{A}$ for any output space $\mathcal{A}$ be a data-independent function. The random variable $f(Z_1

Figures (7)

  • Figure 1: Illustration of the interactive model in the two-sample setting.
  • Figure 2: Power curves for the $L_1$-problem (first and third columns), and $L_2$-problem (second and fourth columns). The first two columns show results for our test procedures, the last two columns show results for test of Canonne:2024:Heterogeneous. First, second and third rows correspond to settings of varying privacy, sample size and dimension respectively Bars indicate pointwise 95% confidence intervals. Privacy level $\varepsilon = \infty$ corresponds to a centred $\chi^2$-test applied to the un privatised data.
  • Figure 3: Plots of transformed separation against dimension for the non-interactive and interactive tests in the $L_1$-problem (a, b) and $L_2$-problem (c, d). $n_1 = n_2 = 1000$, $d = 8$.
  • Figure 4: Plots of transformed separation against dimension for the non-interactive and interactive tests in the $L_1$-problem (a, b) and $L_2$-problem (c, d). $n_1 = n_2 = 1000$.
  • Figure 5: Power curves for the testing problems with the distributions $P_{Y, \gamma}^{\mathrm{Beta}}$, $P_{Y, \gamma}^{\mathrm{Tri}}$ and $P_{Y, \gamma, k}^{\mathrm{Cos}}$ (first, second and third columns respectively), with $k = 1$ (first and second rows), and $k = 2$ (third row). First, second and third rows correspond to settings of varying privacy, sample size and manually specified truncation parameters respectively. Bars indicate pointwise 95% confidence intervals. Privacy level $\varepsilon = \infty$ corresponds to an MMD-based test applied to the un privatised data.
  • ...and 2 more figures

Theorems & Definitions (69)

  • Proposition 1: Dwork:2006:CalibratingNoise
  • Proposition 2
  • Theorem 3
  • Lemma 4
  • Remark 1
  • Theorem 5
  • Remark 2: Comparisons with Mun:2025:LocalPerm
  • Theorem 6
  • Remark 3: Comparisons with Mun:2025:LocalPerm
  • Remark 4
  • ...and 59 more