Conditional Dependence via U-Statistics Pruning
Ferran de Cabrera, Marc Vilà-Insa, Jaume Riba
TL;DR
The paper develops a kernel-based measure of conditional dependence, chsic, by reframing marginal dependence as finite-dimensional correlation through steering-vector mappings and then leveraging incomplete U-statistics to prune data according to a confounder. This yields a practical, inversion-free statistic, C-HSIC_alpha, that scales as $O(L^2)$ and reduces to the classical HSIC when no pruning occurs. Through numerical experiments, the method demonstrates the ability to detect conditional independence and dependence in controlled scenarios, while highlighting the role of the pruning parameter $\alpha$ and the trade-off between conditioning strength and data usage. The approach offers a conceptually new bridge between kernel methods and U-statistics, with potential impact on causal discovery tasks where conditioning on confounders is essential and matrix inversions are prohibitive.
Abstract
The problem of measuring conditional dependence between two random phenomena arises when a third one (a confounder) has a potential influence on the amount of information between them. A typical issue in this challenging problem is the inversion of ill-conditioned autocorrelation matrices. This paper presents a novel measure of conditional dependence based on the use of incomplete unbiased statistics of degree two, which allows to re-interpret independence as uncorrelatedness on a finite-dimensional feature space. This formulation enables to prune data according to observations of the confounder itself, thus avoiding matrix inversions altogether. The proposed approach is articulated as an extension of the Hilbert-Schmidt independence criterion, which becomes expressible through kernels that operate on 4-tuples of data.
