Table of Contents
Fetching ...

A Powerful Bootstrap Test of Independence in High Dimensions

Mauricio Olivares, Tomasz Olma, Daniel Wilhelm

TL;DR

This paper develops a high-dimensional, nonparametric test for pairwise independence between a variable $X$ and a large pool of variables $Y_1,\dots,Y_p$ by taking the maximum of Chatterjee's rank correlations $\hat{\xi}_j$ and calibrating via a block multiplier bootstrap on a 1-dependent representation. The test achieves asymptotic size control uniformly over a broad class of data-generating processes, allows $p$ to grow exponentially with $n$, and imposes no restrictions on the dependence among $Y_j$'s; it is also consistent against any fixed alternative and can be coupled with a stepwise procedure to control the family-wise error rate. Through extensive simulations, the authors show robustness to block size, strong performance in high dimensions—especially for sparse or decaying alternatives—and cases where distance-covariance tests fail to control size under the null. An empirical application to gene expression data demonstrates the method’s practical utility for identifying dependent features and obtaining FWER control in large-scale settings. Overall, the approach provides a computationally attractive, scalable framework for independence screening and inference in ultra-high-dimensional problems.

Abstract

This paper proposes a nonparametric test of pairwise independence of one random variable from a large pool of other random variables. The test statistic is the maximum of several Chatterjee's rank correlations and critical values are computed via a block multiplier bootstrap. We show in simulations that other popular tests based on distance covariances do not necessarily control size under this null. Our test, on the other hand, is shown to asymptotically control size uniformly over a large class of data-generating processes, even when the number of variables is much larger than sample size. The test is consistent against any fixed alternative. It can be combined with a stepwise procedure for selecting those variables from the pool that violate independence, while controlling the family-wise error rate. All formal results leave the dependence among variables in the pool completely unrestricted. In simulations, we find that our test is typically more powerful than competing methods (in settings where they are valid), particularly in high-dimensional scenarios or when there is dependence among variables in the pool.

A Powerful Bootstrap Test of Independence in High Dimensions

TL;DR

This paper develops a high-dimensional, nonparametric test for pairwise independence between a variable and a large pool of variables by taking the maximum of Chatterjee's rank correlations and calibrating via a block multiplier bootstrap on a 1-dependent representation. The test achieves asymptotic size control uniformly over a broad class of data-generating processes, allows to grow exponentially with , and imposes no restrictions on the dependence among 's; it is also consistent against any fixed alternative and can be coupled with a stepwise procedure to control the family-wise error rate. Through extensive simulations, the authors show robustness to block size, strong performance in high dimensions—especially for sparse or decaying alternatives—and cases where distance-covariance tests fail to control size under the null. An empirical application to gene expression data demonstrates the method’s practical utility for identifying dependent features and obtaining FWER control in large-scale settings. Overall, the approach provides a computationally attractive, scalable framework for independence screening and inference in ultra-high-dimensional problems.

Abstract

This paper proposes a nonparametric test of pairwise independence of one random variable from a large pool of other random variables. The test statistic is the maximum of several Chatterjee's rank correlations and critical values are computed via a block multiplier bootstrap. We show in simulations that other popular tests based on distance covariances do not necessarily control size under this null. Our test, on the other hand, is shown to asymptotically control size uniformly over a large class of data-generating processes, even when the number of variables is much larger than sample size. The test is consistent against any fixed alternative. It can be combined with a stepwise procedure for selecting those variables from the pool that violate independence, while controlling the family-wise error rate. All formal results leave the dependence among variables in the pool completely unrestricted. In simulations, we find that our test is typically more powerful than competing methods (in settings where they are valid), particularly in high-dimensional scenarios or when there is dependence among variables in the pool.

Paper Structure

This paper contains 29 sections, 12 theorems, 123 equations, 14 figures, 1 algorithm.

Key Result

theorem 1

Suppose that Assumptions ass:continuity--ass:rates hold. Then, under the null hypothesis $H_0$, there exist positive constants $c$, $C$ depending only on $\gamma$ and $C_1$ such that

Figures (14)

  • Figure 1: Bias, standard deviation, and the root mean squared error of $V_j^B$ based on the formulas in Lemma \ref{['lemma:VB']} together with the optimal choice $q^*(n)$.
  • Figure 2: Optimal choice $q^*(n)$ and the smooth approximation $\tilde{q}(n) \coloneqq (n/16)^{1/3}$.
  • Figure 3: Rejection rates in Models 1--6 with $\tau=0$ under different choices of the big block size $q$.
  • Figure 4: Power curves under linear alternatives in high dimensions.
  • Figure 5: Power curves under cosine alternatives in high dimensions.
  • ...and 9 more figures

Theorems & Definitions (26)

  • theorem 1
  • remark 1: discrete distributions
  • remark 2: bootstrapping Chatterjee's rank correlation
  • remark 3: studentisation
  • theorem 2
  • Lemma 1
  • remark 4: compatibility with rate conditions
  • theorem 3
  • Lemma 2
  • proof
  • ...and 16 more