Practical Kernel Tests of Conditional Independence
Roman Pogodin, Antonin Schrab, Yazhe Li, Danica J. Sutherland, Arthur Gretton
TL;DR
The paper tackles kernel-based conditional independence testing and the bias introduced by conditional mean embedding (CME) estimation. It introduces SplitKCI, a debiased CI statistic that uses independent data splits to reduce CME-induced bias, paired with a train/test split heuristic to balance Type I and II errors. The authors prove consistency and wild bootstrap validity for SplitKCI, and demonstrate through extensive synthetic and real-data experiments that SplitKCI maintains the nominal level while achieving higher power than existing kernels-based and non-kernel CI tests. The approach yields practical, data-efficient CI testing with strong level control and competitive sensitivity, applicable to complex, nonlinear dependencies. It also discusses extensions, parametric alternatives, and interpretability considerations for sensitive domains.
Abstract
We describe a data-efficient, kernel-based approach to statistical testing of conditional independence. A major challenge of conditional independence testing is to obtain the correct test level (the specified upper bound on the rate of false positives), while still attaining competitive test power. Excess false positives arise due to bias in the test statistic, which is in our case obtained using nonparametric kernel ridge regression. We propose SplitKCI, an automated method for bias control for the Kernel-based Conditional Independence (KCI) test based on data splitting. We show that our approach significantly improves test level control for KCI without sacrificing test power, both theoretically and for synthetic and real-world data.
