Causal Discovery with Fewer Conditional Independence Tests
Kirankumar Shiragur, Jiaqi Zhang, Caroline Uhler
TL;DR
The paper tackles the scalability challenge in causal discovery by introducing Causally Consistent Partition Graphs (CCPGs), a coarser yet informative representation learned with a polynomial number of conditional independence tests. It develops a prefix-vertex-set framework and proxy Meek/ v-structure results to construct CCPGs from observational data, with guarantees extending to interventions via $\ ext{I}$-CCPGs. The authors prove that CCPGs are sufficient to identify the true graph in cases where the graph is fully identifiable observationally, or with a verifying set of interventions, and provide an algorithm with complexity on the order of $\mathcal{O}(n^5)$ CI tests (plus intervention terms). This approach yields a principled, efficient path to exact recovery in key regimes while offering a practical coarse representation for complex causal systems. The empirical results illustrate favorable runtime and sample efficiency compared to traditional constraint-based methods, motivating further exploration of CCPG-based strategies in broader settings.
Abstract
Many questions in science center around the fundamental problem of understanding causal relationships. However, most constraint-based causal discovery algorithms, including the well-celebrated PC algorithm, often incur an exponential number of conditional independence (CI) tests, posing limitations in various applications. Addressing this, our work focuses on characterizing what can be learned about the underlying causal graph with a reduced number of CI tests. We show that it is possible to a learn a coarser representation of the hidden causal graph with a polynomial number of tests. This coarser representation, named Causal Consistent Partition Graph (CCPG), comprises of a partition of the vertices and a directed graph defined over its components. CCPG satisfies consistency of orientations and additional constraints which favor finer partitions. Furthermore, it reduces to the underlying causal graph when the causal graph is identifiable. As a consequence, our results offer the first efficient algorithm for recovering the true causal graph with a polynomial number of tests, in special cases where the causal graph is fully identifiable through observational data and potentially additional interventions.
