On the Number of Conditional Independence Tests in Constraint-based Causal Discovery

Marc Franquesa Monés; Jiaqi Zhang; Caroline Uhler

On the Number of Conditional Independence Tests in Constraint-based Causal Discovery

Marc Franquesa Monés, Jiaqi Zhang, Caroline Uhler

Abstract

Learning causal relations from observational data is a fundamental problem with wide-ranging applications across many fields. Constraint-based methods infer the underlying causal structure by performing conditional independence tests. However, existing algorithms such as the prominent PC algorithm need to perform a large number of independence tests, which in the worst case is exponential in the maximum degree of the causal graph. Despite extensive research, it remains unclear if there exist algorithms with better complexity without additional assumptions. Here, we establish an algorithm that achieves a better complexity of $p^{\mathcal{O}(s)}$ tests, where $p$ is the number of nodes in the graph and $s$ denotes the maximum undirected clique size of the underlying essential graph. Complementing this result, we prove that any constraint-based algorithm must perform at least $2^{Ω(s)}$ conditional independence tests, establishing that our proposed algorithm achieves exponent-optimality up to a logarithmic factor in terms of the number of conditional independence tests needed. Finally, we validate our theoretical findings through simulations, on semi-synthetic gene-expression data, and real-world data, demonstrating the efficiency of our algorithm compared to existing methods in terms of number of conditional independence tests needed.

On the Number of Conditional Independence Tests in Constraint-based Causal Discovery

Abstract

tests, where

is the number of nodes in the graph and

denotes the maximum undirected clique size of the underlying essential graph. Complementing this result, we prove that any constraint-based algorithm must perform at least

conditional independence tests, establishing that our proposed algorithm achieves exponent-optimality up to a logarithmic factor in terms of the number of conditional independence tests needed. Finally, we validate our theoretical findings through simulations, on semi-synthetic gene-expression data, and real-world data, demonstrating the efficiency of our algorithm compared to existing methods in terms of number of conditional independence tests needed.

Paper Structure (35 sections, 26 theorems, 4 equations, 16 figures, 3 tables, 5 algorithms)

This paper contains 35 sections, 26 theorems, 4 equations, 16 figures, 3 tables, 5 algorithms.

Introduction
Related works
Organization
Preliminaries
Graph definitions
Markov equivalence classes
Main results
Outline of methods
Upper bound
Learning a prefix node set
Learning the essential graph via GAS
Lower bound
Experiments
Linear Gaussian synthetic data
SERGIO
...and 20 more sections

Key Result

Theorem 1

Given infinite samples from a distribution respecting a DAG $\mathcal{G}$, GAS (i.e., Algorithm alg:learning) outputs $\mathcal{E}(\mathcal{G})$ using $p^{\mathcal{O}(s)}$ CI tests. Here, $s$ is the size of the maximum undirected clique in $\mathcal{E}(\mathcal{G})$.

Figures (16)

Figure 1: A graph with $p$ nodes, where $d=p-2$, and $s=2$.
Figure 2: Meek rules meek1995. The dashed lines partition nodes into sets based on ancestral relationships. Only Meek rule 1 establishes an ancestor for a node that previously had none, forcing a component to split.
Figure 3: Example graph illustrating the classification of nodes into sets $D_S^1$ and $F_S^1$ given the prefix node set $S = \{0,1\}$.
Figure 4: Comparison across Erdős-Rényi graphs on 50 nodes and increasing expected neighborhood size. Results are averaged over 3 runs. (a) Execution time (seconds) presented on a logarithmic scale. (b) Accuracy comparison measured by Structural Hamming Distance (SHD) between predicted and ground-truth graphs normalized by the number of possible edges. Shaded regions show the standard deviation.
Figure 5: Comparison on semi-synthetic data generated using SERGIO. Results are averaged over 5 runs. (a) Execution time (seconds) presented on a logarithmic scale. (b) Accuracy comparison measured by Structural Hamming Distance (SHD) between predicted and ground-truth graphs normalized by the number of possible edges.
...and 11 more figures

Theorems & Definitions (50)

Theorem 1
Theorem 2
Proposition 3
Definition 4: Set $D_S^m$
Definition 5: Set $F_S^m$
Theorem 6
Lemma 6
Lemma 6
proof : Proof of Theorem \ref{['thm:prefix-set']}
Lemma 7: Lemma 7 in zhang2024membership
...and 40 more

On the Number of Conditional Independence Tests in Constraint-based Causal Discovery

Abstract

On the Number of Conditional Independence Tests in Constraint-based Causal Discovery

Authors

Abstract

Table of Contents

Key Result

Figures (16)

Theorems & Definitions (50)