Causal Discovery with Fewer Conditional Independence Tests

Kirankumar Shiragur; Jiaqi Zhang; Caroline Uhler

Causal Discovery with Fewer Conditional Independence Tests

Kirankumar Shiragur, Jiaqi Zhang, Caroline Uhler

TL;DR

The paper tackles the scalability challenge in causal discovery by introducing Causally Consistent Partition Graphs (CCPGs), a coarser yet informative representation learned with a polynomial number of conditional independence tests. It develops a prefix-vertex-set framework and proxy Meek/ v-structure results to construct CCPGs from observational data, with guarantees extending to interventions via $\ ext{I}$-CCPGs. The authors prove that CCPGs are sufficient to identify the true graph in cases where the graph is fully identifiable observationally, or with a verifying set of interventions, and provide an algorithm with complexity on the order of $\mathcal{O}(n^5)$ CI tests (plus intervention terms). This approach yields a principled, efficient path to exact recovery in key regimes while offering a practical coarse representation for complex causal systems. The empirical results illustrate favorable runtime and sample efficiency compared to traditional constraint-based methods, motivating further exploration of CCPG-based strategies in broader settings.

Abstract

Many questions in science center around the fundamental problem of understanding causal relationships. However, most constraint-based causal discovery algorithms, including the well-celebrated PC algorithm, often incur an exponential number of conditional independence (CI) tests, posing limitations in various applications. Addressing this, our work focuses on characterizing what can be learned about the underlying causal graph with a reduced number of CI tests. We show that it is possible to a learn a coarser representation of the hidden causal graph with a polynomial number of tests. This coarser representation, named Causal Consistent Partition Graph (CCPG), comprises of a partition of the vertices and a directed graph defined over its components. CCPG satisfies consistency of orientations and additional constraints which favor finer partitions. Furthermore, it reduces to the underlying causal graph when the causal graph is identifiable. As a consequence, our results offer the first efficient algorithm for recovering the true causal graph with a polynomial number of tests, in special cases where the causal graph is fully identifiable through observational data and potentially additional interventions.

Causal Discovery with Fewer Conditional Independence Tests

TL;DR

-CCPGs. The authors prove that CCPGs are sufficient to identify the true graph in cases where the graph is fully identifiable observationally, or with a verifying set of interventions, and provide an algorithm with complexity on the order of

CI tests (plus intervention terms). This approach yields a principled, efficient path to exact recovery in key regimes while offering a practical coarse representation for complex causal systems. The empirical results illustrate favorable runtime and sample efficiency compared to traditional constraint-based methods, motivating further exploration of CCPG-based strategies in broader settings.

Abstract

Paper Structure (43 sections, 32 theorems, 3 equations, 9 figures, 3 algorithms)

This paper contains 43 sections, 32 theorems, 3 equations, 9 figures, 3 algorithms.

Introduction
Related Works
Organization
Preliminaries
Graph Definitions
D-Separation and Conditional Independence
Interventions
Verifying Intervention Sets and Covered Edges
Main Results
Proxy V-structure and Meek Rule Statements
Prefix Vertex Set
Algorithm for Learning
Correctness and Guarantees
Relation to Covered Edges
Learning Causally Consistent Partition Graph Representations
...and 28 more sections

Key Result

Proposition 2.1

Set $\mathcal{I}$ is a verifying intervention set if and only if for every covered edge $u\to v$ in $\mathcal{G}$, there is $|I\cap\{u,v\}|=1$ for some $I\in\mathcal{I}$.

Figures (9)

Figure 1: (Left).$\{1\}$ and $\{4\}$ are d-separated by $\{2\}$, as all paths are inactive given $\{2\}$. (Right).$\{1\}$ and $\{4\}$ are not d-separated by $\{2,3\}$, as path $1\to 3\leftarrow 4$ is active given $\{2,3\}$ by collider$3$.
Figure 2: Example of CCPG & $\mathcal{I}$-CCPG.(Left). Ground-truth $\mathcal{G}$. (Right). A CCPG representation of $\mathcal{G}$, where $V_1,V_2,V_3$ are indicated by green boxes and $\mathcal{D}$ is illustrated in chalk strokes. Vertices $3,4$ can be in one component as $3\to 4$ is a covered edge. For $\mathcal{I}=\{4\}$, the only $\mathcal{I}$-CCPG is $\mathcal{G}$ itself (due to strong intra-component condition in \ref{['def:TCG']}).
Figure 3: $D_S,E_S,F_S$ satisfy that (1) they contain all downstream vertices of any vertex in them; (2) they do not intersect with $\mathrm{src}(\bar{S})$.
Figure 4: Illustration of $\bar{S}\setminus D_S$ (inside the dashed box). It can be split into connected subgraphs based on vertices in $\mathrm{src}(\bar{S})$ (indicated by the fill color of each vertex in $\bar{S}\setminus D_S$).
Figure 5: Illustration of $J_S^I$, where $I$ is indicated by the purple circle. $J_S^I$ satisfies similar properties as $D_S,E_S$ and $F_S$.
...and 4 more figures

Theorems & Definitions (58)

Proposition 2.1: Theorem 9 in choo2022verification
Definition 3.1: CCPG & $\mathcal{I}$-CCPG
Lemma 3.1: Properties of CCPG
Theorem 3.2: Learning CCPG
Theorem 3.3: $\mathcal{I}$-Learning CCPG
Corollary 3.4: Causal Discovery with Polynomial CI Tests
Lemma 3.4: Proxy V-Structure
Definition 3.5: Prefix Vertex Set
Lemma 3.5: Proxy Meek Rule 1
Definition 4.1: Type-I Set $D_S$
...and 48 more

Causal Discovery with Fewer Conditional Independence Tests

TL;DR

Abstract

Causal Discovery with Fewer Conditional Independence Tests

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (58)