Optimal structure learning and conditional independence testing
Ming Gao, Yuhao Wang, Bryon Aragam
TL;DR
The paper establishes a fundamental link between minimax rates for conditional independence (CI) testing and structure learning for poly-forests in DAGs, showing that optimal structure-learning complexity is governed by CI-testing hardness via a general reduction. It derives explicit-rate results across Bernoulli, Gaussian, and nonparametric models, revealing how the CI-testing radius c scales with sample size n as c \asymp n^{-1/\alpha} and how the poly-forest learning cost scales as n \asymp \frac{\log d}{c^{\alpha}}. The authors characterize an efficient, PC-tree-based algorithm that attains these rates when supplied with an optimal CI test, and they validate the theory with experiments across distributional settings. This work provides a unified statistical framework linking CI testing and structure learning, with implications for sample-efficient DAG recovery and practical algorithm design. The results suggest promising extensions to general DAGs and highlight the central role of CI testing as a subroutine for scalable, minimax-optimal structure learning.
Abstract
We establish a fundamental connection between optimal structure learning and optimal conditional independence testing by showing that the minimax optimal rate for structure learning problems is determined by the minimax rate for conditional independence testing in these problems. This is accomplished by establishing a general reduction between these two problems in the case of poly-forests, and demonstrated by deriving optimal rates for several examples, including Bernoulli, Gaussian and nonparametric models. Furthermore, we show that the optimal algorithm in these settings is a suitable modification of the PC algorithm. This theoretical finding provides a unified framework for analyzing the statistical complexity of structure learning through the lens of minimax testing.
