Table of Contents
Fetching ...

On the Number of Conditional Independence Tests in Constraint-based Causal Discovery

Marc Franquesa Monés, Jiaqi Zhang, Caroline Uhler

Abstract

Learning causal relations from observational data is a fundamental problem with wide-ranging applications across many fields. Constraint-based methods infer the underlying causal structure by performing conditional independence tests. However, existing algorithms such as the prominent PC algorithm need to perform a large number of independence tests, which in the worst case is exponential in the maximum degree of the causal graph. Despite extensive research, it remains unclear if there exist algorithms with better complexity without additional assumptions. Here, we establish an algorithm that achieves a better complexity of $p^{\mathcal{O}(s)}$ tests, where $p$ is the number of nodes in the graph and $s$ denotes the maximum undirected clique size of the underlying essential graph. Complementing this result, we prove that any constraint-based algorithm must perform at least $2^{Ω(s)}$ conditional independence tests, establishing that our proposed algorithm achieves exponent-optimality up to a logarithmic factor in terms of the number of conditional independence tests needed. Finally, we validate our theoretical findings through simulations, on semi-synthetic gene-expression data, and real-world data, demonstrating the efficiency of our algorithm compared to existing methods in terms of number of conditional independence tests needed.

On the Number of Conditional Independence Tests in Constraint-based Causal Discovery

Abstract

Learning causal relations from observational data is a fundamental problem with wide-ranging applications across many fields. Constraint-based methods infer the underlying causal structure by performing conditional independence tests. However, existing algorithms such as the prominent PC algorithm need to perform a large number of independence tests, which in the worst case is exponential in the maximum degree of the causal graph. Despite extensive research, it remains unclear if there exist algorithms with better complexity without additional assumptions. Here, we establish an algorithm that achieves a better complexity of tests, where is the number of nodes in the graph and denotes the maximum undirected clique size of the underlying essential graph. Complementing this result, we prove that any constraint-based algorithm must perform at least conditional independence tests, establishing that our proposed algorithm achieves exponent-optimality up to a logarithmic factor in terms of the number of conditional independence tests needed. Finally, we validate our theoretical findings through simulations, on semi-synthetic gene-expression data, and real-world data, demonstrating the efficiency of our algorithm compared to existing methods in terms of number of conditional independence tests needed.
Paper Structure (35 sections, 26 theorems, 4 equations, 16 figures, 3 tables, 5 algorithms)

This paper contains 35 sections, 26 theorems, 4 equations, 16 figures, 3 tables, 5 algorithms.

Key Result

Theorem 1

Given infinite samples from a distribution respecting a DAG $\mathcal{G}$, GAS (i.e., Algorithm alg:learning) outputs $\mathcal{E}(\mathcal{G})$ using $p^{\mathcal{O}(s)}$ CI tests. Here, $s$ is the size of the maximum undirected clique in $\mathcal{E}(\mathcal{G})$.

Figures (16)

  • Figure 1: A graph with $p$ nodes, where $d=p-2$, and $s=2$.
  • Figure 2: Meek rules meek1995. The dashed lines partition nodes into sets based on ancestral relationships. Only Meek rule 1 establishes an ancestor for a node that previously had none, forcing a component to split.
  • Figure 3: Example graph illustrating the classification of nodes into sets $D_S^1$ and $F_S^1$ given the prefix node set $S = \{0,1\}$.
  • Figure 4: Comparison across Erdős-Rényi graphs on 50 nodes and increasing expected neighborhood size. Results are averaged over 3 runs. (a) Execution time (seconds) presented on a logarithmic scale. (b) Accuracy comparison measured by Structural Hamming Distance (SHD) between predicted and ground-truth graphs normalized by the number of possible edges. Shaded regions show the standard deviation.
  • Figure 5: Comparison on semi-synthetic data generated using SERGIO. Results are averaged over 5 runs. (a) Execution time (seconds) presented on a logarithmic scale. (b) Accuracy comparison measured by Structural Hamming Distance (SHD) between predicted and ground-truth graphs normalized by the number of possible edges.
  • ...and 11 more figures

Theorems & Definitions (50)

  • Theorem 1
  • Theorem 2
  • Proposition 3
  • Definition 4: Set $D_S^m$
  • Definition 5: Set $F_S^m$
  • Theorem 6
  • Lemma 6
  • Lemma 6
  • proof : Proof of Theorem \ref{['thm:prefix-set']}
  • Lemma 7: Lemma 7 in zhang2024membership
  • ...and 40 more