Score-based Generative Modeling for Conditional Independence Testing
Yixin Ren, Chenghou Jin, Yewei Xia, Li Ke, Longtao Huang, Hui Xue, Hao Zhang, Jihong Guan, Shuigeng Zhou
TL;DR
This work tackles conditional independence testing in high dimensions by introducing SGMCIT, a score-based CI testing framework that leverages sliced conditional score matching to learn $s(x,z;\theta) \approx \nabla_x \log p(x|z)$ and Langevin dynamics to generate null-hypothesis samples. A goodness-of-fit stage validates the generated distribution, improving reliability and interpretability. The authors provide theoretical guarantees, including identifiability, consistency of the score estimator, and asymptotic Type I error control, and show state-of-the-art performance on synthetic and real data across diverse settings. The approach offers robust Type I error control with high testing power and scalable computation, representing a promising direction for generative-model-based CI testing.
Abstract
Determining conditional independence (CI) relationships between random variables is a fundamental yet challenging task in machine learning and statistics, especially in high-dimensional settings. Existing generative model-based CI testing methods, such as those utilizing generative adversarial networks (GANs), often struggle with undesirable modeling of conditional distributions and training instability, resulting in subpar performance. To address these issues, we propose a novel CI testing method via score-based generative modeling, which achieves precise Type I error control and strong testing power. Concretely, we first employ a sliced conditional score matching scheme to accurately estimate conditional score and use Langevin dynamics conditional sampling to generate null hypothesis samples, ensuring precise Type I error control. Then, we incorporate a goodness-of-fit stage into the method to verify generated samples and enhance interpretability in practice. We theoretically establish the error bound of conditional distributions modeled by score-based generative models and prove the validity of our CI tests. Extensive experiments on both synthetic and real-world datasets show that our method significantly outperforms existing state-of-the-art methods, providing a promising way to revitalize generative model-based CI testing.
