Table of Contents
Fetching ...

Interventional Causal Discovery in a Mixture of DAGs

Burak Varıcı, Dmitriy Katz-Rogozhnikov, Dennis Wei, Prasanna Sattigeri, Ali Tajer

TL;DR

This work addresses causal discovery when data arise from a mixture of DAGs over the same variables, where skeleton uncertainty and cross-component cycles pose fundamental challenges. It establishes matching necessary and sufficient intervention sizes to identify true edges (edges present in at least one component) and designs CADIM, an adaptive algorithm that recovers all true edges with $\mathcal{O}(n^2)$ interventions. The analysis introduces cyclic complexity to bound the intervention size gap in the presence of cycles, showing the gap is at most the local cyclic complexity $\tau_i$ and that the worst-case per-node intervention size is $|{\rm pa}_{\rm m}(i)|+\tau_i+1$. Empirically, CADIM demonstrates robust performance on synthetic mixtures, confirming the necessity of interventions for skeleton learning and achieving near-optimal intervention budgets even with moderate cyclic complexity, which has practical implications for complex domains such as genomics and dynamical systems.$

Abstract

Causal interactions among a group of variables are often modeled by a single causal graph. In some domains, however, these interactions are best described by multiple co-existing causal graphs, e.g., in dynamical systems or genomics. This paper addresses the hitherto unknown role of interventions in learning causal interactions among variables governed by a mixture of causal systems, each modeled by one directed acyclic graph (DAG). Causal discovery from mixtures is fundamentally more challenging than single-DAG causal discovery. Two major difficulties stem from (i)~an inherent uncertainty about the skeletons of the component DAGs that constitute the mixture and (ii)~possibly cyclic relationships across these component DAGs. This paper addresses these challenges and aims to identify edges that exist in at least one component DAG of the mixture, referred to as the true edges. First, it establishes matching necessary and sufficient conditions on the size of interventions required to identify the true edges. Next, guided by the necessity results, an adaptive algorithm is designed that learns all true edges using $O(n^2)$ interventions, where $n$ is the number of nodes. Remarkably, the size of the interventions is optimal if the underlying mixture model does not contain cycles across its components. More generally, the gap between the intervention size used by the algorithm and the optimal size is quantified. It is shown to be bounded by the cyclic complexity number of the mixture model, defined as the size of the minimal intervention that can break the cycles in the mixture, which is upper bounded by the number of cycles among the ancestors of a node.

Interventional Causal Discovery in a Mixture of DAGs

TL;DR

This work addresses causal discovery when data arise from a mixture of DAGs over the same variables, where skeleton uncertainty and cross-component cycles pose fundamental challenges. It establishes matching necessary and sufficient intervention sizes to identify true edges (edges present in at least one component) and designs CADIM, an adaptive algorithm that recovers all true edges with interventions. The analysis introduces cyclic complexity to bound the intervention size gap in the presence of cycles, showing the gap is at most the local cyclic complexity and that the worst-case per-node intervention size is . Empirically, CADIM demonstrates robust performance on synthetic mixtures, confirming the necessity of interventions for skeleton learning and achieving near-optimal intervention budgets even with moderate cyclic complexity, which has practical implications for complex domains such as genomics and dynamical systems.$

Abstract

Causal interactions among a group of variables are often modeled by a single causal graph. In some domains, however, these interactions are best described by multiple co-existing causal graphs, e.g., in dynamical systems or genomics. This paper addresses the hitherto unknown role of interventions in learning causal interactions among variables governed by a mixture of causal systems, each modeled by one directed acyclic graph (DAG). Causal discovery from mixtures is fundamentally more challenging than single-DAG causal discovery. Two major difficulties stem from (i)~an inherent uncertainty about the skeletons of the component DAGs that constitute the mixture and (ii)~possibly cyclic relationships across these component DAGs. This paper addresses these challenges and aims to identify edges that exist in at least one component DAG of the mixture, referred to as the true edges. First, it establishes matching necessary and sufficient conditions on the size of interventions required to identify the true edges. Next, guided by the necessity results, an adaptive algorithm is designed that learns all true edges using interventions, where is the number of nodes. Remarkably, the size of the interventions is optimal if the underlying mixture model does not contain cycles across its components. More generally, the gap between the intervention size used by the algorithm and the optimal size is quantified. It is shown to be bounded by the cyclic complexity number of the mixture model, defined as the size of the minimal intervention that can break the cycles in the mixture, which is upper bounded by the number of cycles among the ancestors of a node.
Paper Structure (37 sections, 10 theorems, 40 equations, 4 figures, 2 algorithms)

This paper contains 37 sections, 10 theorems, 40 equations, 4 figures, 2 algorithms.

Key Result

Lemma 1

Consider an inseparable pair $(i,j) \in \mathbf{E}_{\rm i}$ and an intervention $\mathcal{I} \subseteq \mathbf{V}$. We have the following identifiability guarantees using the interventional mixture distribution ${p_{{\rm m},\mathcal{I}}}(x)$.

Figures (4)

  • Figure 1: (a)-(b): sample component DAGs; (c) the mixture DAG for $\mathcal{I}=\emptyset$, note that $\Delta=\{2,3,4\}$ (when the distribution of node $1$ remains the same) ; (d)-(e): post-intervention component DAGs for $\mathcal{I}=\{2\}$; (f): corresponding $\mathcal{I}$-mixture DAG. Also note that true edges $\mathbf{E}_{\rm t}=\{(1\rightarrow 2), (2 \rightarrow 3), (3\rightarrow 2), (3 \rightarrow 4), (1 \rightarrow 4)\}$, inseparable pairs $\mathbf{E}_{\rm i}=\{(1-2), (1-3), (1-4), (2-3), (2-4), (3-4)\}$, and emergent edges $\mathbf{E}_{\rm e}=\{(1,3), (2,4)\}$.
  • Figure 2: Mean true edge recovery rates and quantification of mean cyclic complexity of a node.
  • Figure 3: Sample DAGs for a mixture of two DAGs
  • Figure 4: Additional experiment results for true edge recovery

Theorems & Definitions (15)

  • Definition 1: True edge
  • Definition 2: Inseparable pair
  • Definition 3: Emergent pair
  • Definition 4: $\Delta$-through path
  • Definition 5: $\mathcal{I}$-mixture DAG
  • Lemma 1
  • Theorem 1: Intervention sizes
  • Lemma 2
  • Theorem 2: Intervention sizes -- trees
  • Lemma 3
  • ...and 5 more