Table of Contents
Fetching ...

Membership Testing in Markov Equivalence Classes via Independence Query Oracles

Jiaqi Zhang, Kirankumar Shiragur, Caroline Uhler

TL;DR

This work studies the problem of testing whether a hidden causal DAG belongs to a given Markov equivalence class using independence-query oracles. It proves a worst-case lower bound of $\exp(\Omega(s))$ conditional independence tests, where $s$ is the size of the maximum undirected clique in the MEC's essential graph, and provides an algorithm achieving $\exp(O(s)) + O(\log n)$ tests, matching the bound in the exponent. By introducing two canonical CI test oracles and leveraging the DAG associahedron, the authors offer a geometric interpretation and show that testing can be significantly easier than learning, especially for graphs with high in-degree but small clique size. The results yield instance-dependent, tight bounds and illuminate how testing can aid learning by focusing on a restricted MEC and exploiting structural graph properties. The work has potential practical impact for validating predefined causal hypotheses with limited data and guiding subsequent learning steps.

Abstract

Understanding causal relationships between variables is a fundamental problem with broad impact in numerous scientific fields. While extensive research has been dedicated to learning causal graphs from data, its complementary concept of testing causal relationships has remained largely unexplored. While learning involves the task of recovering the Markov equivalence class (MEC) of the underlying causal graph from observational data, the testing counterpart addresses the following critical question: Given a specific MEC and observational data from some causal graph, can we determine if the data-generating causal graph belongs to the given MEC? We explore constraint-based testing methods by establishing bounds on the required number of conditional independence tests. Our bounds are in terms of the size of the maximum undirected clique ($s$) of the given MEC. In the worst case, we show a lower bound of $\exp(Ω(s))$ independence tests. We then give an algorithm that resolves the task with $\exp(O(s))$ tests, matching our lower bound. Compared to the learning problem, where algorithms often use a number of independence tests that is exponential in the maximum in-degree, this shows that testing is relatively easier. In particular, it requires exponentially less independence tests in graphs featuring high in-degrees and small clique sizes. Additionally, using the DAG associahedron, we provide a geometric interpretation of testing versus learning and discuss how our testing result can aid learning.

Membership Testing in Markov Equivalence Classes via Independence Query Oracles

TL;DR

This work studies the problem of testing whether a hidden causal DAG belongs to a given Markov equivalence class using independence-query oracles. It proves a worst-case lower bound of conditional independence tests, where is the size of the maximum undirected clique in the MEC's essential graph, and provides an algorithm achieving tests, matching the bound in the exponent. By introducing two canonical CI test oracles and leveraging the DAG associahedron, the authors offer a geometric interpretation and show that testing can be significantly easier than learning, especially for graphs with high in-degree but small clique size. The results yield instance-dependent, tight bounds and illuminate how testing can aid learning by focusing on a restricted MEC and exploiting structural graph properties. The work has potential practical impact for validating predefined causal hypotheses with limited data and guiding subsequent learning steps.

Abstract

Understanding causal relationships between variables is a fundamental problem with broad impact in numerous scientific fields. While extensive research has been dedicated to learning causal graphs from data, its complementary concept of testing causal relationships has remained largely unexplored. While learning involves the task of recovering the Markov equivalence class (MEC) of the underlying causal graph from observational data, the testing counterpart addresses the following critical question: Given a specific MEC and observational data from some causal graph, can we determine if the data-generating causal graph belongs to the given MEC? We explore constraint-based testing methods by establishing bounds on the required number of conditional independence tests. Our bounds are in terms of the size of the maximum undirected clique () of the given MEC. In the worst case, we show a lower bound of independence tests. We then give an algorithm that resolves the task with tests, matching our lower bound. Compared to the learning problem, where algorithms often use a number of independence tests that is exponential in the maximum in-degree, this shows that testing is relatively easier. In particular, it requires exponentially less independence tests in graphs featuring high in-degrees and small clique sizes. Additionally, using the DAG associahedron, we provide a geometric interpretation of testing versus learning and discuss how our testing result can aid learning.
Paper Structure (29 sections, 15 theorems, 5 equations, 8 figures, 1 algorithm)

This paper contains 29 sections, 15 theorems, 5 equations, 8 figures, 1 algorithm.

Key Result

Theorem 1

Given a specific MEC $[\mathcal{G}]$, there exists a hidden DAG $\mathcal{H}$ such that any algorithm requires at least $\exp(\Omega(s))$ CI tests to test if $\mathcal{H}\in[\mathcal{G}]$. Here, $s$ is the size of the maximum undirected clique in $\mathcal{E}(\mathcal{G})$.

Figures (8)

  • Figure 1: (Left).$\{1\}$ and $\{4\}$ are d-separated by $\{2\}$, as both paths are inactive given $\{2\}$. (Right).$\{1\}$ and $\{4\}$ are not d-separated by $\{2,3\}$, as the path $1\to 3\mathop{\mathrm{\leftarrow}}\nolimits 4$ is active given $\{2,3\}$ by the collider$3$.
  • Figure 2: (Left). DAG $\mathcal{G}$. (Right). Essential graph $\mathcal{E}(\mathcal{G})$ representing $[\mathcal{G}]$. In $\mathcal{E}(\mathcal{G})$, the maximum undirected clique has size $s=3$ (highlighted in green). The maximum in-degree of $\mathcal{G}$ is $d=5$ (on node $4$).
  • Figure 3: Examples of canonical CI tests. The given MEC $\mathcal{G}$ is on the left, and the hidden $\mathcal{H}$ is on the right. (a) Class-I CI test $1{\mathrel{\mspace{2mu}\perp\mspace{-12mu}\perp\mspace{2mu}}} 4\mid 2$ agrees between $\mathcal{H},\mathcal{G}$. (b) Class-II CI test $3{\mathrel{\mspace{2mu}\perp\mspace{-12mu}\perp\mspace{2mu}}} 4\mid 1$ disagrees between $\mathcal{H},\mathcal{G}$.
  • Figure 4: (Left). Permutohedron $\mathcal{A}_3$. (Right). DAG associahedron $\mathcal{A}_3(1\to 3\mathop{\mathrm{\leftarrow}}\nolimits 2)$. The corresponding DAG and contracted edge are in red.
  • Figure 5: Illustration of Meek rule 1.
  • ...and 3 more figures

Theorems & Definitions (26)

  • Theorem 1
  • Theorem 2
  • Definition 3: Class-I CI Test
  • Definition 4: Class-II CI Test
  • Lemma 5
  • Corollary 6
  • proof
  • Lemma 7
  • proof : Proof of Theorem \ref{['thm:lowerbound']}.
  • Lemma 8
  • ...and 16 more