Practical Kernel Tests of Conditional Independence

Roman Pogodin; Antonin Schrab; Yazhe Li; Danica J. Sutherland; Arthur Gretton

Practical Kernel Tests of Conditional Independence

Roman Pogodin, Antonin Schrab, Yazhe Li, Danica J. Sutherland, Arthur Gretton

TL;DR

The paper tackles kernel-based conditional independence testing and the bias introduced by conditional mean embedding (CME) estimation. It introduces SplitKCI, a debiased CI statistic that uses independent data splits to reduce CME-induced bias, paired with a train/test split heuristic to balance Type I and II errors. The authors prove consistency and wild bootstrap validity for SplitKCI, and demonstrate through extensive synthetic and real-data experiments that SplitKCI maintains the nominal level while achieving higher power than existing kernels-based and non-kernel CI tests. The approach yields practical, data-efficient CI testing with strong level control and competitive sensitivity, applicable to complex, nonlinear dependencies. It also discusses extensions, parametric alternatives, and interpretability considerations for sensitive domains.

Abstract

We describe a data-efficient, kernel-based approach to statistical testing of conditional independence. A major challenge of conditional independence testing is to obtain the correct test level (the specified upper bound on the rate of false positives), while still attaining competitive test power. Excess false positives arise due to bias in the test statistic, which is in our case obtained using nonparametric kernel ridge regression. We propose SplitKCI, an automated method for bias control for the Kernel-based Conditional Independence (KCI) test based on data splitting. We show that our approach significantly improves test level control for KCI without sacrificing test power, both theoretically and for synthetic and real-world data.

Practical Kernel Tests of Conditional Independence

TL;DR

Abstract

Paper Structure (41 sections, 8 theorems, 73 equations, 20 figures, 3 algorithms)

This paper contains 41 sections, 8 theorems, 73 equations, 20 figures, 3 algorithms.

Introduction
Related work
Introduction to regression-based conditional independence testing
Kernel-based measures of conditional dependence
Conditional mean embeddings via kernel ridge regression
SplitKCI
Reducing CME estimation bias in KCI
A heuristic for choosing train/test split ratio for SplitKCI
Statistical testing with SplitKCI
Wild bootstrap for computing p-values
Full test for SplitKCI
Experiments
Description of tasks and the experimental setup
Influence of train/test splitting on KCI-style methods for a post-nonlinear model
Methods comparison for a post-nonlinear model
...and 26 more sections

Key Result

Theorem 2

Each of the following conditions hold if and only if $A \perp \!\!\! \perp B \,|\, C$,The condition eq:daudin-one-and-a-half is not explicitly stated by daudin1980partial, but follows from eq:daudin-one; see pogodin2022efficient, summarised in app:sec:kci. where in each case the test functions $h$ a

Figures (20)

Figure 1: RatInABox data visualisation. A. Simulated rat trajectory in a square box with a blocked off centre. B. Activation patterns of three grid cells w.r.t. rat's position in the box. C. Simulated neural activity of the cells in B. Black dots indicate which data points are used in the dataset. D. Activation patterns of three head direction cells w.r.t. rat's head direction (polar coordinates). E. Same as C, but for the head direction cells.
Figure 2: Train/test splitting in a post-nonlinear model with $d=4$ and a fixed dataset size $N=n+m$ (for $n$ test and $m$ training points). A. Type I error (top) vs. rejection rate for the train/test split heuristic (bottom) for different dataset sizes $N$ and test to train split ratios. B. Type II error (top) vs. rejection rate for the train/test split heuristic (bottom) for different dataset sizes $N$ and split ratios. Lines/shaded area: mean/$\pm$SE over 100 trials, $\alpha=0.05$. Asterisks: approximate split ratios from \ref{['alg:train_test_split']} to show performance of the splitting heuristic.
Figure 3: Post-nonlinear model experiments for increasing dimensionality of the task and a fixed dataset size $N=n+m$ (for $n$ test and $m$ training points). A. Type I error for $N=200$ (left) and $N=400$ (right) data points. B. Type II error for $N=200$ (left) and $N=400$ (right) data points. Lines/shaded area: mean/$\pm$SE over 100 trials, $\alpha=0.05$.
Figure 4: Train/test splitting in the synthetic neural data. Type I error (top) vs. rejection rate for the train/test split heuristic (bottom) for different dataset sizes $N=n+m$ and test to train ($n/m$) split ratios. Lines/shaded area: mean/$\pm$SE over 100 trials, $\alpha=0.05$. Asterisks: approximate split ratios from \ref{['alg:train_test_split']} to show performance of the splitting heuristic.
Figure 5: Train/test splitting in the synthetic neural data. Type II error (top) vs. rejection rate for the train/test split heuristic (bottom) for different dataset sizes $N=n+m$ and test to train ($n/m$) split ratios. Lines/shaded area: mean/$\pm$SE over 100 trials, $\alpha=0.05$. Asterisks: approximate split ratios from \ref{['alg:train_test_split']} to show performance of the splitting heuristic.
...and 15 more figures

Theorems & Definitions (11)

Definition 1: daudin1980partial
Theorem 2: daudin1980partial
Theorem 3
Definition 4: SplitKCI
Definition 5: $\beta$-interpolation space, steinwart2012mercerli2023optimal
Theorem 6: Bias in KCI and SplitKCI
Theorem 7: Wild bootstrap
Theorem 7
Theorem 7: Bias in KCI and SplitKCI
Theorem 7: Wild bootstrap
...and 1 more

Practical Kernel Tests of Conditional Independence

TL;DR

Abstract

Practical Kernel Tests of Conditional Independence

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (20)

Theorems & Definitions (11)