Table of Contents
Fetching ...

Nyström $M$-Hilbert-Schmidt Independence Criterion

Florian Kalinke, Zoltán Szabó

TL;DR

The paper introduces Nyström M-HSIC (N-MHSIC), a scalable estimator for the Hilbert-Schmidt independence criterion that extends joint independence testing to more than two variables ($M\ge 2$) with theoretical consistency guarantees. By estimating mean embeddings via Nyström approximations and composing them through the tensor-product RKHS, the method achieves a runtime of $O(M{n'}^3+Mn'n)$ while delivering a convergence rate of $O(n^{-1/2})$ when the Nyström sample size satisfies $n' = \tilde{\Theta}(\sqrt{n})$. The authors prove a key error propagation lemma for tensor products and demonstrate, across synthetic and real data, that N-MHSIC rivals quadratic-time V-HSIC in statistical power yet offers substantial speedups, enabling effective causal discovery and multiway dependency testing in large-scale settings. This work thus provides a principled, scalable tool for joint independence testing and causality analysis in high-dimensional, multi-component data. The practical impact lies in enabling reliable DAG inference and dependency analysis in domains with complex, multivariate dependencies and large data volumes.

Abstract

Kernel techniques are among the most popular and powerful approaches of data science. Among the key features that make kernels ubiquitous are (i) the number of domains they have been designed for, (ii) the Hilbert structure of the function class associated to kernels facilitating their statistical analysis, and (iii) their ability to represent probability distributions without loss of information. These properties give rise to the immense success of Hilbert-Schmidt independence criterion (HSIC) which is able to capture joint independence of random variables under mild conditions, and permits closed-form estimators with quadratic computational complexity (w.r.t. the sample size). In order to alleviate the quadratic computational bottleneck in large-scale applications, multiple HSIC approximations have been proposed, however these estimators are restricted to $M=2$ random variables, do not extend naturally to the $M\ge 2$ case, and lack theoretical guarantees. In this work, we propose an alternative Nyström-based HSIC estimator which handles the $M\ge 2$ case, prove its consistency, and demonstrate its applicability in multiple contexts, including synthetic examples, dependency testing of media annotations, and causal discovery.

Nyström $M$-Hilbert-Schmidt Independence Criterion

TL;DR

The paper introduces Nyström M-HSIC (N-MHSIC), a scalable estimator for the Hilbert-Schmidt independence criterion that extends joint independence testing to more than two variables () with theoretical consistency guarantees. By estimating mean embeddings via Nyström approximations and composing them through the tensor-product RKHS, the method achieves a runtime of while delivering a convergence rate of when the Nyström sample size satisfies . The authors prove a key error propagation lemma for tensor products and demonstrate, across synthetic and real data, that N-MHSIC rivals quadratic-time V-HSIC in statistical power yet offers substantial speedups, enabling effective causal discovery and multiway dependency testing in large-scale settings. This work thus provides a principled, scalable tool for joint independence testing and causality analysis in high-dimensional, multi-component data. The practical impact lies in enabling reliable DAG inference and dependency analysis in domains with complex, multivariate dependencies and large data volumes.

Abstract

Kernel techniques are among the most popular and powerful approaches of data science. Among the key features that make kernels ubiquitous are (i) the number of domains they have been designed for, (ii) the Hilbert structure of the function class associated to kernels facilitating their statistical analysis, and (iii) their ability to represent probability distributions without loss of information. These properties give rise to the immense success of Hilbert-Schmidt independence criterion (HSIC) which is able to capture joint independence of random variables under mild conditions, and permits closed-form estimators with quadratic computational complexity (w.r.t. the sample size). In order to alleviate the quadratic computational bottleneck in large-scale applications, multiple HSIC approximations have been proposed, however these estimators are restricted to random variables, do not extend naturally to the case, and lack theoretical guarantees. In this work, we propose an alternative Nyström-based HSIC estimator which handles the case, prove its consistency, and demonstrate its applicability in multiple contexts, including synthetic examples, dependency testing of media annotations, and causal discovery.
Paper Structure (21 sections, 11 theorems, 60 equations, 5 figures, 1 table)

This paper contains 21 sections, 11 theorems, 60 equations, 5 figures, 1 table.

Key Result

Lemma 4.1

For a kernel $\ell$ with corresponding feature map $\phi_\ell$, an i.i.d. sample $\hat{\mathbb{Q}}_n$ of distribution $\mathbb{Q}{}$, and a subsample $\tilde{\mathbb{Q}}_{n'}$ of $\hat{\mathbb{Q}}_n$, the Nyström estimate of $\mu_{\ell}(\mathbb{Q}{})$ is given by with Gram matrix $\mathbf{K}_{\ell,n'n'} = \left[\ell(\tilde{x}^i,\tilde{x}^j)\right]_{i,j\in[n']} \in \mathbb{R}^{n'\times n'}$, and

Figures (5)

  • Figure 1: Estimation accuracy for $M=2$ components; the theoretical HSIC value is zero.
  • Figure 2: Power on dependent data. Runtime on log scale.
  • Figure 3: Ratio of correctly identified DAGs with $4$ nodes.
  • Figure 4: Test power vs. runtime on the Million Song Data.
  • Figure 5: Testing for joint independence on the residuals of DAGs with three nodes (left) and the DAG with the largest $p$-value (right). The $p$-values agree on DAGs $1$ to $24$.

Theorems & Definitions (16)

  • Lemma 4.1: Nyström mean embedding, chatalic22nystrom
  • Lemma 4.2: Computation of Nyström $M$-HSIC
  • Remark 1
  • Lemma 4.3: Error propagation on tensor products
  • Proposition 4.1: Error bound for Nyström $M$-HSIC
  • Lemma 4.4: Deviation bound for V-statistic based HSIC estimator
  • Remark 2
  • Theorem A.1: Bound on mean embeddings
  • Theorem A.2: Hoeffding's inequality for U-statistics
  • Lemma A.1: Connection between U- and V-statistics
  • ...and 6 more