A Powerful Bootstrap Test of Independence in High Dimensions
Mauricio Olivares, Tomasz Olma, Daniel Wilhelm
TL;DR
This paper develops a high-dimensional, nonparametric test for pairwise independence between a variable $X$ and a large pool of variables $Y_1,\dots,Y_p$ by taking the maximum of Chatterjee's rank correlations $\hat{\xi}_j$ and calibrating via a block multiplier bootstrap on a 1-dependent representation. The test achieves asymptotic size control uniformly over a broad class of data-generating processes, allows $p$ to grow exponentially with $n$, and imposes no restrictions on the dependence among $Y_j$'s; it is also consistent against any fixed alternative and can be coupled with a stepwise procedure to control the family-wise error rate. Through extensive simulations, the authors show robustness to block size, strong performance in high dimensions—especially for sparse or decaying alternatives—and cases where distance-covariance tests fail to control size under the null. An empirical application to gene expression data demonstrates the method’s practical utility for identifying dependent features and obtaining FWER control in large-scale settings. Overall, the approach provides a computationally attractive, scalable framework for independence screening and inference in ultra-high-dimensional problems.
Abstract
This paper proposes a nonparametric test of pairwise independence of one random variable from a large pool of other random variables. The test statistic is the maximum of several Chatterjee's rank correlations and critical values are computed via a block multiplier bootstrap. We show in simulations that other popular tests based on distance covariances do not necessarily control size under this null. Our test, on the other hand, is shown to asymptotically control size uniformly over a large class of data-generating processes, even when the number of variables is much larger than sample size. The test is consistent against any fixed alternative. It can be combined with a stepwise procedure for selecting those variables from the pool that violate independence, while controlling the family-wise error rate. All formal results leave the dependence among variables in the pool completely unrestricted. In simulations, we find that our test is typically more powerful than competing methods (in settings where they are valid), particularly in high-dimensional scenarios or when there is dependence among variables in the pool.
