Table of Contents
Fetching ...

Differentially Private Conditional Independence Testing

Iden Kalemaj, Shiva Prasad Kasiviswanathan, Aaditya Ramdas

TL;DR

This work addresses conditional independence testing under differential privacy, focusing on continuous conditioning variables $Z$. It introduces two DP CI tests, PrivGCM and PrivCRT, each built on solid non-private counterparts and accompanied by rigorous type-I error control and power guarantees. The analysis shows that privacy noise can, in some regimes, improve finite-sample type-I error behavior, while larger sample sizes mitigate power loss, with PrivCRT often delivering stronger power under model-X assumptions. The proposed methods are validated empirically on synthetic and real data, demonstrating robust privacy-preserving CI testing with practical applicability to sensitive domains like genomics and clinical data.

Abstract

Conditional independence (CI) tests are widely used in statistical data analysis, e.g., they are the building block of many algorithms for causal graph discovery. The goal of a CI test is to accept or reject the null hypothesis that $X \perp \!\!\! \perp Y \mid Z$, where $X \in \mathbb{R}, Y \in \mathbb{R}, Z \in \mathbb{R}^d$. In this work, we investigate conditional independence testing under the constraint of differential privacy. We design two private CI testing procedures: one based on the generalized covariance measure of Shah and Peters (2020) and another based on the conditional randomization test of Candès et al. (2016) (under the model-X assumption). We provide theoretical guarantees on the performance of our tests and validate them empirically. These are the first private CI tests with rigorous theoretical guarantees that work for the general case when $Z$ is continuous.

Differentially Private Conditional Independence Testing

TL;DR

This work addresses conditional independence testing under differential privacy, focusing on continuous conditioning variables . It introduces two DP CI tests, PrivGCM and PrivCRT, each built on solid non-private counterparts and accompanied by rigorous type-I error control and power guarantees. The analysis shows that privacy noise can, in some regimes, improve finite-sample type-I error behavior, while larger sample sizes mitigate power loss, with PrivCRT often delivering stronger power under model-X assumptions. The proposed methods are validated empirically on synthetic and real data, demonstrating robust privacy-preserving CI testing with practical applicability to sensitive domains like genomics and clinical data.

Abstract

Conditional independence (CI) tests are widely used in statistical data analysis, e.g., they are the building block of many algorithms for causal graph discovery. The goal of a CI test is to accept or reject the null hypothesis that , where . In this work, we investigate conditional independence testing under the constraint of differential privacy. We design two private CI testing procedures: one based on the generalized covariance measure of Shah and Peters (2020) and another based on the conditional randomization test of Candès et al. (2016) (under the model-X assumption). We provide theoretical guarantees on the performance of our tests and validate them empirically. These are the first private CI tests with rigorous theoretical guarantees that work for the general case when is continuous.
Paper Structure (47 sections, 22 theorems, 47 equations, 10 figures, 2 algorithms)

This paper contains 47 sections, 22 theorems, 47 equations, 10 figures, 2 algorithms.

Key Result

Lemma 2.3

Let $\varepsilon > 0$ and $f \colon \mathcal{D} \to \mathbb{R}^d$ be a function with $\ell_1$-sensitivity $\Delta_f$. Let $W \sim \mathrm{Lap}(0,\Delta_f/\varepsilon)$ be a noise vector from the Laplace distribution with scale parameter $\Delta_f/\varepsilon$. The Laplace Mechanism that, on input $\

Figures (10)

  • Figure 1: Type-I error control of PrivToT, private Kendall, PrivGCM, and PrivCRT (under the null): the first two fail to control Type-I error.
  • Figure 2: Comparison of the power of private and nonprivate GCM tests as the dependence strength $\beta$ increases. At $d=5$, the (nonprivate) GCM fails to provide type-I error control when $\beta=0$.
  • Figure 3: Comparison of the type-I error and power of private and nonprivate GCM tests as the dataset size $n$ increases. Again, at $d=5$ with $\beta=0$, the (nonprivate) GCM fails to provide type-I error control even at large $n$ (in fact, its type-I error gets worse with $n$).
  • Figure 4: Comparing power of private and nonprivate CRT tests as we increase dependence $\beta$.
  • Figure 5: Comparison of the type-I error and power of private and nonprivate CRT tests as we increase the dataset size $n$.
  • ...and 5 more figures

Theorems & Definitions (43)

  • Definition 2.1: Differential privacy DworkMNS16
  • Definition 2.2: $\ell_1$-sensitivity
  • Lemma 2.3: Laplace Mechanism DworkMNS16
  • Lemma 2.4: Post-Processing DworkMNS16
  • Theorem 2.5: Restated Theorem 5 of KusnerSSW16
  • Definition 3.1: Good fit
  • Theorem 3.2
  • Theorem 3.3
  • Lemma 3.3
  • Theorem 4.1: Report Noisy Max DworkR14McKennaS20Ding21
  • ...and 33 more