Table of Contents
Fetching ...

Learning Mixtures of Unknown Causal Interventions

Abhinav Kumar, Kirankumar Shiragur, Caroline Uhler

TL;DR

It is demonstrated that conducting interventions, whether do or soft, yields distributions with sufficient diversity and properties conducive to efficiently recovering each component within the mixture, and it is established that the sample complexity required to disentangle mixed data inversely correlates with the extent of change induced by an intervention in the equations governing the affected variable values.

Abstract

The ability to conduct interventions plays a pivotal role in learning causal relationships among variables, thus facilitating applications across diverse scientific disciplines such as genomics, economics, and machine learning. However, in many instances within these applications, the process of generating interventional data is subject to noise: rather than data being sampled directly from the intended interventional distribution, interventions often yield data sampled from a blend of both intended and unintended interventional distributions. We consider the fundamental challenge of disentangling mixed interventional and observational data within linear Structural Equation Models (SEMs) with Gaussian additive noise without the knowledge of the true causal graph. We demonstrate that conducting interventions, whether do or soft, yields distributions with sufficient diversity and properties conducive to efficiently recovering each component within the mixture. Furthermore, we establish that the sample complexity required to disentangle mixed data inversely correlates with the extent of change induced by an intervention in the equations governing the affected variable values. As a result, the causal graph can be identified up to its interventional Markov Equivalence Class, similar to scenarios where no noise influences the generation of interventional data. We further support our theoretical findings by conducting simulations wherein we perform causal discovery from such mixed data.

Learning Mixtures of Unknown Causal Interventions

TL;DR

It is demonstrated that conducting interventions, whether do or soft, yields distributions with sufficient diversity and properties conducive to efficiently recovering each component within the mixture, and it is established that the sample complexity required to disentangle mixed data inversely correlates with the extent of change induced by an intervention in the equations governing the affected variable values.

Abstract

The ability to conduct interventions plays a pivotal role in learning causal relationships among variables, thus facilitating applications across diverse scientific disciplines such as genomics, economics, and machine learning. However, in many instances within these applications, the process of generating interventional data is subject to noise: rather than data being sampled directly from the intended interventional distribution, interventions often yield data sampled from a blend of both intended and unintended interventional distributions. We consider the fundamental challenge of disentangling mixed interventional and observational data within linear Structural Equation Models (SEMs) with Gaussian additive noise without the knowledge of the true causal graph. We demonstrate that conducting interventions, whether do or soft, yields distributions with sufficient diversity and properties conducive to efficiently recovering each component within the mixture. Furthermore, we establish that the sample complexity required to disentangle mixed data inversely correlates with the extent of change induced by an intervention in the equations governing the affected variable values. As a result, the causal graph can be identified up to its interventional Markov Equivalence Class, similar to scenarios where no noise influences the generation of interventional data. We further support our theoretical findings by conducting simulations wherein we perform causal discovery from such mixed data.

Paper Structure

This paper contains 45 sections, 10 theorems, 58 equations, 6 figures, 1 table, 1 algorithm.

Key Result

Theorem 4.1

Let $\mathbb{P}_{mix}(\bm{V})$ be a mixture of soft atomic interventions defined over a Linear-SEM with additive Gaussian noise with $"n"$ endogenous variables (Definition def:mixture_of_intervention) such that the number of components $|\mathcal{I}|$ is fixed. Given Assumption assm:effectiveness is for some permutation $\rho:\{1,2,\ldots,|\mathcal{I}|\}\rightarrow \{1,2,\ldots,|\mathcal{I}|\}$ an

Figures (6)

  • Figure 1: Performance of Alg. \ref{['algo:mixture_utigsp']} as we vary sample size and number of nodes: The first row (a-c) shows the performance when the mixed data contains atomic intervention on all the nodes and observational data. The second row (d-f) shows the performance when the number of atomic interventions (chosen randomly) in mixed data is taken to be half of the number of nodes along with observational data. The column shows different evaluation metrics, i.e., Parameter Estimation Error, Average Jaccard Similarity, and SHD. The symbols ($\uparrow$) represent higher is better, and $(\uparrow)$ represents the opposite (see Evaluation metric paragraph in §\ref{['sec:empirical_results']}). In summary, performance improves for both cases as the number of samples increases. However, the graph with more nodes requires a larger sample to perform similarly. For a detailed discussion, see §\ref{['sec:empirical_results_empirical']}.
  • Figure 2: Other Evaluations Metrics for the simulation experiments in Fig. \ref{['fig:node_var_all3']}: The top row denotes the corresponding metrics for all interventions in the mixture setting and the bottom row to the half setting. The first column shows the number of components estimated by our algorithm Mixture-UTIGSP. For the all setting, the actual number of components corresponding to the system with nodes 4,6 and 8 are 5,7,9 respectively (one intervention on each node + one observational distribution). We observe that Mixture-UTIGSP is able to correctly estimate the number of components even with a small number of samples. Similarly, for half setting, the actual number of components corresponding to the system with nodes 4,6 and 8 are 3,4,5 respectively (intervention on half of node and one observational distribution). Even for this case Mixture-UTIGSP is able to correctly estimate the number of components. The second column shows the error in the estimation of the mixing coefficient ($\pi_i$'s, see Definition \ref{['def:mixture_of_intervention']}). For both cases, we observe that the error in the estimation of the mixing coefficient goes to zero as the sample size increases.
  • Figure 3: Performance of Alg. \ref{['algo:mixture_utigsp']} as we change the cutoff ratio used for automatic component selection: We consider graphs with 6 nodes in this experiment with half intervention setting. In step 2 of Mixture-UTIGSP, we select the number of components using the log-likelihood curve. We scan the curve starting from the mixture model with the largest number of components to the smallest and stop where the relative change in the likelihood increases above a cutoff ratio (to select the elbow point of the curve). The cutoff ratio in the algorithm is chosen to be an arbitrary number close to zero. Here we compare the performance of Mixture-UTIGSP on all three metrics for the half setting of Fig. \ref{['fig:node_var_all3']} as we vary the cutoff ratio. We observe that for the cutoff ratio close to zero i.e. 0.01, 0.15,0.3 the performance remains similar showing that the model selection criteria are robust to the selected cutoff ratio. The number of nodes
  • Figure 4: Performance of Alg. \ref{['algo:mixture_utigsp']} as we change the density of the underlying true causal graph: The mixture data contains atomic interventions on all nodes as well as observational data (half setting as described in the results in §\ref{['sec:empirical_results']}). The column shows different evaluation metrics, i.e., Parameter Estimation Error, Average Jaccard Similarity, and SHD (see Evaluation metric paragraph in §\ref{['sec:empirical_results']}). In this experiment, we vary the density of the underlying causal graph by keeping the edges in a fully connected graph with a fixed probability, labeled as density in the legend of the above plots (see random graph generation paragraph in §\ref{['app:subsec:expt_setup']} for details). The maximum possible density is 1, i.e., the probability of keeping an edge is 1, corresponding to a fully connected graph, and the lowest possible density is 0. We observe that as the density of the graph increases, we require more samples to achieve similar performance to less dense graphs on all three metrics. Our Theorem \ref{['theorem:main_identifiability']} shows that the sample complexity required for estimating the parameters of the mixture is proportional to the norm of the adjacency matrix $\norm{A}$ and as the density of the graph increases $\norm{A}$ increases. Thus, as the density increases, we require more samples to achieve a similar performance in estimating the parameters of the mixture, as seen in the parameter estimation error plot above.
  • Figure 5: Ground truth and estimated causal graph for Protein signaling dataset sachs_protein_signalling_dataset: Fig \ref{['fig:ground-graph']} is the graph created with the help of domain experts for this problem igsp2017. \ref{['fig:estimated-graph-ours']} shows the graph estimated by our Mixture-UTIGSP and \ref{['fig:estimated-graph-oracle']} is the graph estimated by oracle UT-IGSP when they are given the ground truth disentangled mixture. The blue colored arrow in 1b and 1c shows the correctly recovered edges in the domain expert graph. Green shows the edges with the same skeleton in the domain expert graph but in a reversed direction. The red shows the edges that are incorrectly added in the estimated graph. We observe that Mixture-UTIGSP correctly identifies two more edges (PKA->ERK and PKA-> Akt) as compared to an oracle which could be due to randomness in the UTIGSP algorithm. For this estimation, the best-performing cutoff of 0.01 was selected (see Table \ref{['tbl:sachs_combined']}).
  • ...and 1 more figures

Theorems & Definitions (22)

  • Definition 4.1: Mixture of Soft Atomic Interventions
  • Theorem 4.1: Identifiability of Mixture Parameters
  • Corollary 4.1.1: Mixture-MEC
  • proof
  • Remark
  • Lemma 5.0
  • Remark
  • Remark
  • Definition 5.1
  • Theorem 5.1: Theorem 3.1 in Belkin2010PolynomialLO
  • ...and 12 more