Differentially Private Conditional Independence Testing

Iden Kalemaj; Shiva Prasad Kasiviswanathan; Aaditya Ramdas

Differentially Private Conditional Independence Testing

Iden Kalemaj, Shiva Prasad Kasiviswanathan, Aaditya Ramdas

TL;DR

This work addresses conditional independence testing under differential privacy, focusing on continuous conditioning variables $Z$. It introduces two DP CI tests, PrivGCM and PrivCRT, each built on solid non-private counterparts and accompanied by rigorous type-I error control and power guarantees. The analysis shows that privacy noise can, in some regimes, improve finite-sample type-I error behavior, while larger sample sizes mitigate power loss, with PrivCRT often delivering stronger power under model-X assumptions. The proposed methods are validated empirically on synthetic and real data, demonstrating robust privacy-preserving CI testing with practical applicability to sensitive domains like genomics and clinical data.

Abstract

Conditional independence (CI) tests are widely used in statistical data analysis, e.g., they are the building block of many algorithms for causal graph discovery. The goal of a CI test is to accept or reject the null hypothesis that $X \perp \!\!\! \perp Y \mid Z$, where $X \in \mathbb{R}, Y \in \mathbb{R}, Z \in \mathbb{R}^d$. In this work, we investigate conditional independence testing under the constraint of differential privacy. We design two private CI testing procedures: one based on the generalized covariance measure of Shah and Peters (2020) and another based on the conditional randomization test of Candès et al. (2016) (under the model-X assumption). We provide theoretical guarantees on the performance of our tests and validate them empirically. These are the first private CI tests with rigorous theoretical guarantees that work for the general case when $Z$ is continuous.

Differentially Private Conditional Independence Testing

TL;DR

This work addresses conditional independence testing under differential privacy, focusing on continuous conditioning variables

. It introduces two DP CI tests, PrivGCM and PrivCRT, each built on solid non-private counterparts and accompanied by rigorous type-I error control and power guarantees. The analysis shows that privacy noise can, in some regimes, improve finite-sample type-I error behavior, while larger sample sizes mitigate power loss, with PrivCRT often delivering stronger power under model-X assumptions. The proposed methods are validated empirically on synthetic and real data, demonstrating robust privacy-preserving CI testing with practical applicability to sensitive domains like genomics and clinical data.

Abstract

, where

. In this work, we investigate conditional independence testing under the constraint of differential privacy. We design two private CI testing procedures: one based on the generalized covariance measure of Shah and Peters (2020) and another based on the conditional randomization test of Candès et al. (2016) (under the model-X assumption). We provide theoretical guarantees on the performance of our tests and validate them empirically. These are the first private CI tests with rigorous theoretical guarantees that work for the general case when

is continuous.

Paper Structure (47 sections, 22 theorems, 47 equations, 10 figures, 2 algorithms)

This paper contains 47 sections, 22 theorems, 47 equations, 10 figures, 2 algorithms.

Introduction
Our Contributions.
Related Work
Private Conditional Independence Testing.
Private (non-conditional) Independence Testing.
Non-private Conditional Independence Testing.
Preliminaries
Notation.
Background on Differential Privacy
Background on Hypothesis Testing
Type-I error and validity.
Power.
Residuals of Kernel Ridge Regression
Private Generalized Covariance Measure
GCM Test.
...and 32 more sections

Key Result

Lemma 2.3

Let $\varepsilon > 0$ and $f \colon \mathcal{D} \to \mathbb{R}^d$ be a function with $\ell_1$-sensitivity $\Delta_f$. Let $W \sim \mathrm{Lap}(0,\Delta_f/\varepsilon)$ be a noise vector from the Laplace distribution with scale parameter $\Delta_f/\varepsilon$. The Laplace Mechanism that, on input $\

Figures (10)

Figure 1: Type-I error control of PrivToT, private Kendall, PrivGCM, and PrivCRT (under the null): the first two fail to control Type-I error.
Figure 2: Comparison of the power of private and nonprivate GCM tests as the dependence strength $\beta$ increases. At $d=5$, the (nonprivate) GCM fails to provide type-I error control when $\beta=0$.
Figure 3: Comparison of the type-I error and power of private and nonprivate GCM tests as the dataset size $n$ increases. Again, at $d=5$ with $\beta=0$, the (nonprivate) GCM fails to provide type-I error control even at large $n$ (in fact, its type-I error gets worse with $n$).
Figure 4: Comparing power of private and nonprivate CRT tests as we increase dependence $\beta$.
Figure 5: Comparison of the type-I error and power of private and nonprivate CRT tests as we increase the dataset size $n$.
...and 5 more figures

Theorems & Definitions (43)

Definition 2.1: Differential privacy DworkMNS16
Definition 2.2: $\ell_1$-sensitivity
Lemma 2.3: Laplace Mechanism DworkMNS16
Lemma 2.4: Post-Processing DworkMNS16
Theorem 2.5: Restated Theorem 5 of KusnerSSW16
Definition 3.1: Good fit
Theorem 3.2
Theorem 3.3
Lemma 3.3
Theorem 4.1: Report Noisy Max DworkR14McKennaS20Ding21
...and 33 more

Differentially Private Conditional Independence Testing

TL;DR

Abstract

Differentially Private Conditional Independence Testing

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (43)