Pearson Chi-squared Conditional Randomization Test
Adel Javanmard, Mohammad Mehrabi
TL;DR
The paper tackles conditional independence testing in the model-X framework by introducing the PCR test, which leverages distributional information on covariates to construct counterfactuals and perform a high-resolution multinomial Pearson $\chi^2$ test on grouped scores. PCR achieves valid finite- and asymptotic-size control with a small number of randomizations, and its power is analyzed through the conditional relative density, revealing regimes where PCR outperforms existing CRT-based methods. It introduces practical extensions including parameter-free PCR via Bonferroni aggregation and robust PCR that remains valid under misspecification of $P_{X|Z}$. Empirically, PCR demonstrates superior power and computational efficiency in simulations and real data (Capital Bikeshare), underscoring its usefulness for scalable CI testing in complex, high-dimensional problems.
Abstract
Conditional independence (CI) testing arises naturally in many scientific problems and applications domains. The goal of this problem is to investigate the conditional independence between a response variable $Y$ and another variable $X$, while controlling for the effect of a high-dimensional confounding variable $Z$. In this paper, we introduce a novel test, called `Pearson Chi-squared Conditional Randomization' (PCR) test, which uses the distributional information on covariates $X,Z$ and constructs randomizations to test conditional independence. PCR leverages the i.i.d-ness property of the observations to obtain high-resolution p-values with a very small number of conditional randomizations. We also provide a power analysis of the PCR test, which captures the effect of various parameters of the test, the sample size and the distance of the alternative from the set of null distributions, measured in terms of a notion called `conditional relative density'. In addition, we propose two extensions of the PCR test, with important practical implications: $(i)$ parameter-free PCR, which uses Bonferroni's correction to decide on a tuning parameter in the test; $(ii)$ robust PCR, which avoids inflations in the size of the test when there is slight error in estimating the conditional law $P_{X|Z}$.
