Table of Contents
Fetching ...

The Generalised Kernel Covariance Measure

Luca Bergen, Dino Sejdinovic, Vanessa Didelez

Abstract

We consider the problem of conditional independence (CI) testing and adopt a kernel-based approach. Kernel-based CI tests embed variables in reproducing kernel Hilbert spaces, regress their embeddings on the conditioning variables, and test the resulting residuals for marginal independence. This approach yields tests that are sensitive to a broad range of conditional dependencies. Existing methods, however, rely heavily on kernel ridge regression, which is computationally expensive when properly tuned and yields poorly calibrated tests when left untuned, which limits their practical usefulness. We propose the Generalised Kernel Covariance Measure (GKCM), a regression-model-agnostic kernel-based CI test that accommodates a broad class of regression estimators. Building on the Generalised Hilbertian Covariance Measure framework (Lundborg et al., 2022), we characterise conditions under which GKCM satisfies uniform asymptotic level guarantees. In simulations, GKCM paired with tree-based regression models frequently outperforms state-of-the-art CI tests across a diverse range of data-generating processes, achieving better type I error control and competitive or superior power.

The Generalised Kernel Covariance Measure

Abstract

We consider the problem of conditional independence (CI) testing and adopt a kernel-based approach. Kernel-based CI tests embed variables in reproducing kernel Hilbert spaces, regress their embeddings on the conditioning variables, and test the resulting residuals for marginal independence. This approach yields tests that are sensitive to a broad range of conditional dependencies. Existing methods, however, rely heavily on kernel ridge regression, which is computationally expensive when properly tuned and yields poorly calibrated tests when left untuned, which limits their practical usefulness. We propose the Generalised Kernel Covariance Measure (GKCM), a regression-model-agnostic kernel-based CI test that accommodates a broad class of regression estimators. Building on the Generalised Hilbertian Covariance Measure framework (Lundborg et al., 2022), we characterise conditions under which GKCM satisfies uniform asymptotic level guarantees. In simulations, GKCM paired with tree-based regression models frequently outperforms state-of-the-art CI tests across a diverse range of data-generating processes, achieving better type I error control and competitive or superior power.

Paper Structure

This paper contains 22 sections, 2 theorems, 41 equations, 5 figures, 1 table.

Key Result

lemma 1

Let $\mathcal{X}$ and $\mathcal{F}$ denote Polish spaces. For every random variable $X$ taking values in the standard Borel space $(\mathcal{X}, \mathcal{B}_X)$ and every Borel-measurable and injective function $\phi: \mathcal{X} \to \mathcal{F}$, it holds that $\sigma(X) = \sigma(\phi(X))$. $\black

Figures (5)

  • Figure 1: Rejection rates in the null settings with rejection threshold $p < 0.05$ (100 iterations). Error bars indicate 95% Wilson confidence intervals and dashed lines the nominal level.
  • Figure 2: Rejection rates in the alternative settings with rejection threshold $p < 0.05$ (100 iterations). Error bars indicate 95% Wilson confidence intervals.
  • Figure 3: Rejection rates in the scenarios by zhang_kernel-based_2011 (100 iterations).
  • Figure 4: Rejection rates in the null settings with rejection threshold $p < 0.05$ (100 iterations). Error bars indicate 95% Wilson confidence intervals and dashed lines the nominal level.
  • Figure 5: Rejection rates in the alternative settings with rejection threshold $p < 0.05$ (100 iterations). Error bars indicate 95% Wilson confidence intervals.

Theorems & Definitions (2)

  • lemma 1
  • theorem 1