Table of Contents
Fetching ...

Testing independence and conditional independence in high dimensions via coordinatewise Gaussianization

Jinyuan Chang, Yue Du, Jing He, Qiwei Yao

Abstract

We propose new statistical tests, in high-dimensional settings, for testing the independence of two random vectors and their conditional independence given a third random vector. The key idea is simple, i.e., we first transform each component variable to the standard normal via its marginal empirical distribution, and we then test for independence and conditional independence of the transformed random vectors using appropriate $L_\infty$-type test statistics. While we are testing some necessary conditions of the independence or the conditional independence, the new tests outperform the 13 frequently used testing methods in a large scale simulation comparison. The advantage of the new tests can be summarized as follows: (i) they do not require any moment conditions, (ii) they allow arbitrary dependence structures of the components among the random vectors, and (iii) they allow the dimensions of random vectors to diverge at the exponential rates of the sample size. The critical values of the proposed tests are determined by a computationally efficient multiplier bootstrap procedure. Theoretical analysis shows that the sizes of the proposed tests can be well controlled by the nominal significance level, and the proposed tests are also consistent under certain local alternatives. The finite sample performance of the new tests is illustrated via extensive simulation studies and a real data application.

Testing independence and conditional independence in high dimensions via coordinatewise Gaussianization

Abstract

We propose new statistical tests, in high-dimensional settings, for testing the independence of two random vectors and their conditional independence given a third random vector. The key idea is simple, i.e., we first transform each component variable to the standard normal via its marginal empirical distribution, and we then test for independence and conditional independence of the transformed random vectors using appropriate -type test statistics. While we are testing some necessary conditions of the independence or the conditional independence, the new tests outperform the 13 frequently used testing methods in a large scale simulation comparison. The advantage of the new tests can be summarized as follows: (i) they do not require any moment conditions, (ii) they allow arbitrary dependence structures of the components among the random vectors, and (iii) they allow the dimensions of random vectors to diverge at the exponential rates of the sample size. The critical values of the proposed tests are determined by a computationally efficient multiplier bootstrap procedure. Theoretical analysis shows that the sizes of the proposed tests can be well controlled by the nominal significance level, and the proposed tests are also consistent under certain local alternatives. The finite sample performance of the new tests is illustrated via extensive simulation studies and a real data application.

Paper Structure

This paper contains 115 sections, 33 theorems, 770 equations, 2 figures, 12 tables, 1 algorithm.

Key Result

Theorem 1

Let $p \lesssim n^{\varkappa_1}$ and $q\lesssim n^{\varkappa_2}$ for any given constants $\varkappa_1\geq0$ and $\varkappa_2\geq0$. Under the null hypothesis $\mathbb{H}_0$ in eq:equind, then $\mathbb{P}(H_{n}> \hat{{\rm cv}}_{{\rm ind},\alpha}) \to \alpha$ as $n \to \infty$.

Figures (2)

  • Figure S1: Conditional dependence network of the 11 sectors (denoted by the nodes) obtained by using the CI-FNN test with Rademacher multiplier. There exists an edge between two nodes if the conditional independence test between them is significant. The sizes of the nodes are proportional to their degrees.
  • Figure S2: Conditional dependence network of the 11 sectors (denoted by the nodes) obtained by using the CI-Lasso test with Rademacher multiplier. There exists an edge between two nodes if the conditional independence test between them is significant. The sizes of the nodes are proportional to their degrees.

Theorems & Definitions (40)

  • Remark 1
  • Remark 2
  • Remark 3
  • Remark 4
  • Theorem 1
  • Theorem 2
  • Definition 1: $(\vartheta,C)$-smooth function
  • Definition 2: $(\vartheta,C)$-smooth generalized hierarchical interaction model
  • Definition 3
  • Theorem 3
  • ...and 30 more