Table of Contents
Fetching ...

Differentially private Bayesian tests

Abhisek Chakraborty, Saptati Datta

TL;DR

This paper integrates differential privacy with Bayesian hypothesis testing by formulating DP Bayes factors based on test statistics within a principled data-generative framework. It uses a hierarchical, partitioned model with mixture priors to keep Bayes factors bounded under privacy constraints, enabling a Laplace-based privacy mechanism and a data-driven cut-off to preserve predefined error rates. The authors derive closed-form expressions and consistency results for Bayes factors under z, t, χ^2, and F tests, and provide practical algorithms for hyperparameter tuning and power optimization. Through simulations and a DAIC-WOZ gender-d difference case study, the approach demonstrates interpretable Bayesian evidence under privacy budgets, offering a scalable alternative to differentially private frequentist testing with formal privacy guarantees.

Abstract

Differential privacy has emerged as an significant cornerstone in the realm of scientific hypothesis testing utilizing confidential data. In reporting scientific discoveries, Bayesian tests are widely adopted since they effectively circumnavigate the key criticisms of P-values, namely, lack of interpretability and inability to quantify evidence in support of the competing hypotheses. We present a novel differentially private Bayesian hypotheses testing framework that arise naturally under a principled data generative mechanism, inherently maintaining the interpretability of the resulting inferences. Furthermore, by focusing on differentially private Bayes factors based on widely used test statistics, we circumvent the need to model the complete data generative mechanism and ensure substantial computational benefits. We also provide a set of sufficient conditions to establish results on Bayes factor consistency under the proposed framework. The utility of the devised technology is showcased via several numerical experiments.

Differentially private Bayesian tests

TL;DR

This paper integrates differential privacy with Bayesian hypothesis testing by formulating DP Bayes factors based on test statistics within a principled data-generative framework. It uses a hierarchical, partitioned model with mixture priors to keep Bayes factors bounded under privacy constraints, enabling a Laplace-based privacy mechanism and a data-driven cut-off to preserve predefined error rates. The authors derive closed-form expressions and consistency results for Bayes factors under z, t, χ^2, and F tests, and provide practical algorithms for hyperparameter tuning and power optimization. Through simulations and a DAIC-WOZ gender-d difference case study, the approach demonstrates interpretable Bayesian evidence under privacy budgets, offering a scalable alternative to differentially private frequentist testing with formal privacy guarantees.

Abstract

Differential privacy has emerged as an significant cornerstone in the realm of scientific hypothesis testing utilizing confidential data. In reporting scientific discoveries, Bayesian tests are widely adopted since they effectively circumnavigate the key criticisms of P-values, namely, lack of interpretability and inability to quantify evidence in support of the competing hypotheses. We present a novel differentially private Bayesian hypotheses testing framework that arise naturally under a principled data generative mechanism, inherently maintaining the interpretability of the resulting inferences. Furthermore, by focusing on differentially private Bayes factors based on widely used test statistics, we circumvent the need to model the complete data generative mechanism and ensure substantial computational benefits. We also provide a set of sufficient conditions to establish results on Bayes factor consistency under the proposed framework. The utility of the devised technology is showcased via several numerical experiments.
Paper Structure (17 sections, 10 theorems, 40 equations, 4 figures, 1 algorithm)

This paper contains 17 sections, 10 theorems, 40 equations, 4 figures, 1 algorithm.

Key Result

Lemma 2.1

For $\omega_n\in(0, 1/2)$, $\hbox{\rm BF}^t_{10}(\mathbf{x}^{(i)} \mid \tau_{0,i}^2,\tau_{1,i}^2 ,\omega_n)$ is bounded between $[\frac{\omega_n}{1-\omega_n}, \frac{1-\omega_n}{\omega_n}]$.

Figures (4)

  • Figure 1: Determining size $\alpha$ Bayes factor cut-off ($t$-test). We present the distribution of log-Bayes factor in non-private and private, under $\hbox{H}_0$. Non-private and private Bayes factor cut-offs, given size of the test $\alpha = 0.05$, sample size $n=100$ and privacy budget $\varepsilon = 1$, with fixed hyper-parameters $M_n = 5$ given $a=3$.
  • Figure 2: Comparison of non-private and private Bayesian $t$-tests under different prior specifications and privacy budgets.
  • Figure 3: DAIC-WOZ database. The plot illustrates the gender-specific densities of the PHQ-8 scores, revealing potential differences between genders.
  • Figure 4: DAIC-WOZ database. Privatized log Bayes factor against the null hypotheses, as a function of standardized effect size, for varying values of privacy parameter $\varepsilon$.

Theorems & Definitions (14)

  • Definition 1: Differential Privacy, dwork2006differential
  • Definition 2: Global sensitivity
  • Lemma 2.1
  • Lemma 2.2
  • Corollary 2.2.1
  • Theorem 2.3
  • proof
  • Definition 3: Conventional Bayes factor consistency chib2016
  • Theorem 3.1
  • Corollary 3.1.1
  • ...and 4 more