Table of Contents
Fetching ...

Differentially private testing for relevant dependencies in high dimensions

Patrick Bastian, Holger Dette, Martin Dunsche

TL;DR

This work addresses testing for practically relevant dependencies among components of a high-dimensional vector under differential privacy. It reframes the problem with a composite null H0(Δ): ||θ||_∞ ≤ Δ and develops a DP-compatible bootstrap method that exploits extremal-set estimation to gain power in sparse settings. The authors establish rigorous statistical guarantees and demonstrate strong finite-sample performance, especially when a gap separates extremal coordinates, with applications to genomic and proteomic medical data. The approach advances DP high-dimensional inference for complex dependence structures and yields practical tools for private dependence testing.

Abstract

We investigate the problem of detecting dependencies between the components of a high-dimensional vector. Our approach advances the existing literature in two important respects. First, we consider the problem under privacy constraints. Second, instead of testing whether the coordinates are pairwise independent, we are interested in determining whether certain pairwise associations between the components (such as all pairwise Kendall's $τ$ coefficients) do not exceed a given threshold in absolute value. Considering hypotheses of this form is motivated by the observation that in the high-dimensional regime, it is rare and perhaps impossible to have a null hypothesis that can be modeled exactly by assuming that all pairwise associations are precisely equal to zero. The formulation of the null hypothesis as a composite hypothesis makes the problem of constructing tests already non-standard in the non-private setting. Additionally, under privacy constraints, state of the art procedures rely on permutation approaches that are rendered invalid under a composite null. We propose a novel bootstrap based methodology that is especially powerful in sparse settings, develop theoretical guarantees under mild assumptions and show that the proposed method enjoys good finite sample properties even in the high privacy regime. Additionally, we present applications in medical data that showcase the applicability of our methodology.

Differentially private testing for relevant dependencies in high dimensions

TL;DR

This work addresses testing for practically relevant dependencies among components of a high-dimensional vector under differential privacy. It reframes the problem with a composite null H0(Δ): ||θ||_∞ ≤ Δ and develops a DP-compatible bootstrap method that exploits extremal-set estimation to gain power in sparse settings. The authors establish rigorous statistical guarantees and demonstrate strong finite-sample performance, especially when a gap separates extremal coordinates, with applications to genomic and proteomic medical data. The approach advances DP high-dimensional inference for complex dependence structures and yields practical tools for private dependence testing.

Abstract

We investigate the problem of detecting dependencies between the components of a high-dimensional vector. Our approach advances the existing literature in two important respects. First, we consider the problem under privacy constraints. Second, instead of testing whether the coordinates are pairwise independent, we are interested in determining whether certain pairwise associations between the components (such as all pairwise Kendall's coefficients) do not exceed a given threshold in absolute value. Considering hypotheses of this form is motivated by the observation that in the high-dimensional regime, it is rare and perhaps impossible to have a null hypothesis that can be modeled exactly by assuming that all pairwise associations are precisely equal to zero. The formulation of the null hypothesis as a composite hypothesis makes the problem of constructing tests already non-standard in the non-private setting. Additionally, under privacy constraints, state of the art procedures rely on permutation approaches that are rendered invalid under a composite null. We propose a novel bootstrap based methodology that is especially powerful in sparse settings, develop theoretical guarantees under mild assumptions and show that the proposed method enjoys good finite sample properties even in the high privacy regime. Additionally, we present applications in medical data that showcase the applicability of our methodology.

Paper Structure

This paper contains 27 sections, 13 theorems, 128 equations, 10 figures, 8 algorithms.

Key Result

Lemma 2.4

Let $T$ denote a $\mathbb{R}^d$-valued statistic. The Gaussian mechanism $\mathcal{M}(X) = T(X) + \frac{\Delta_2T}{\sqrt{2 \rho}} Y$ where $Y \sim \mathcal{N}_d(0, I_{d \times d})$ and $\Delta_2 T:= \sup_{X \sim X'} \| T(X) - T(X') \|_2$, preserves $\rho$-zCDP.

Figures (10)

  • Figure 1: Histogram of pair-wise (absolute) Kendall's $\tau$ coefficients between different genomes from the 1000GenomesProjectConsortium2015 of the $21.55$ Mb - $21.65$ Mb window restricted to chromosome 22.
  • Figure 2: Algorithm \ref{['alg:HD_test']}
  • Figure 3: Hoeffding based test \ref{['eq:hoeffding_test']}
  • Figure 5: Empirical rejection probabilities of the test defined by Algorithm \ref{['alg:HD_test']} for different privacy parameters $\rho=0.1,0.25,1$ and models F1) (first row) and F2) (second row) with $n \in \{250,500,1000\}$, $p=d(d-1)/2\approx n$ with $d=\lceil\sqrt{2n}\rceil$ (moderate dimensional regime).
  • Figure 6: Empirical rejection probabilities of the test defined by Algorithm \ref{['alg:HD_test']} for different privacy parameters $\rho=0.1,0.25,1$ and models F1) (first row) and F2) (second row) with $n \in \{250,500,1000\}$, $p=d(d-1)/2$ with $d=n$ (high-dimensional regime).
  • ...and 5 more figures

Theorems & Definitions (28)

  • Remark 2.1
  • Example 2.2
  • Definition 2.3: Definition 8.1 in bun2016concentrated, Approximate zCDP
  • Lemma 2.4
  • Proposition 2.5
  • Remark 2.6
  • Theorem 4.1
  • Theorem 4.3
  • Theorem 4.4
  • Remark 4.5
  • ...and 18 more