Table of Contents
Fetching ...

Discovering Mixtures of Structural Causal Models from Time Series Data

Sumanth Varambally, Yi-An Ma, Rose Yu

TL;DR

This work tackles causal discovery when time-series data come from a mixture of unknown SCMs, introducing MCD, an end-to-end variational framework that learns $K$ complete SCMs and per-sample mixture memberships from $N$ time-series samples. It provides two instantiations, MCD-Linear and MCD-Nonlinear, leveraging an ELBO objective with variational posteriors over the SCMs and sample-level mixing, including differentiable edge sampling via Gumbel-Softmax. Theoretical contributions include identifiability results for mixtures of linear SVARs with equal-variance Gaussian noise and a sufficient condition for general SCM mixtures, along with a formal link between ELBO and true log-likelihood. Empirically, MCD outperforms state-of-the-art baselines on synthetic and real-world heterogenous datasets, accurately clustering samples by underlying causal graphs and recovering multiple SCMs, with clear implications for domains like finance, climate, and neuroscience.

Abstract

Discovering causal relationships from time series data is significant in fields such as finance, climate science, and neuroscience. However, contemporary techniques rely on the simplifying assumption that data originates from the same causal model, while in practice, data is heterogeneous and can stem from different causal models. In this work, we relax this assumption and perform causal discovery from time series data originating from a mixture of causal models. We propose a general variational inference-based framework called MCD to infer the underlying causal models as well as the mixing probability of each sample. Our approach employs an end-to-end training process that maximizes an evidence-lower bound for the data likelihood. We present two variants: MCD-Linear for linear relationships and independent noise, and MCD-Nonlinear for nonlinear causal relationships and history-dependent noise. We demonstrate that our method surpasses state-of-the-art benchmarks in causal discovery tasks through extensive experimentation on synthetic and real-world datasets, particularly when the data emanates from diverse underlying causal graphs. Theoretically, we prove the identifiability of such a model under some mild assumptions.

Discovering Mixtures of Structural Causal Models from Time Series Data

TL;DR

This work tackles causal discovery when time-series data come from a mixture of unknown SCMs, introducing MCD, an end-to-end variational framework that learns complete SCMs and per-sample mixture memberships from time-series samples. It provides two instantiations, MCD-Linear and MCD-Nonlinear, leveraging an ELBO objective with variational posteriors over the SCMs and sample-level mixing, including differentiable edge sampling via Gumbel-Softmax. Theoretical contributions include identifiability results for mixtures of linear SVARs with equal-variance Gaussian noise and a sufficient condition for general SCM mixtures, along with a formal link between ELBO and true log-likelihood. Empirically, MCD outperforms state-of-the-art baselines on synthetic and real-world heterogenous datasets, accurately clustering samples by underlying causal graphs and recovering multiple SCMs, with clear implications for domains like finance, climate, and neuroscience.

Abstract

Discovering causal relationships from time series data is significant in fields such as finance, climate science, and neuroscience. However, contemporary techniques rely on the simplifying assumption that data originates from the same causal model, while in practice, data is heterogeneous and can stem from different causal models. In this work, we relax this assumption and perform causal discovery from time series data originating from a mixture of causal models. We propose a general variational inference-based framework called MCD to infer the underlying causal models as well as the mixing probability of each sample. Our approach employs an end-to-end training process that maximizes an evidence-lower bound for the data likelihood. We present two variants: MCD-Linear for linear relationships and independent noise, and MCD-Nonlinear for nonlinear causal relationships and history-dependent noise. We demonstrate that our method surpasses state-of-the-art benchmarks in causal discovery tasks through extensive experimentation on synthetic and real-world datasets, particularly when the data emanates from diverse underlying causal graphs. Theoretically, we prove the identifiability of such a model under some mild assumptions.
Paper Structure (40 sections, 9 theorems, 58 equations, 22 figures, 4 tables)

This paper contains 40 sections, 9 theorems, 58 equations, 22 figures, 4 tables.

Key Result

Proposition 1

Under the data generation process described in Figure fig:pgm_assumption, the data likelihood admits the following evidence lower bound (ELBO):

Figures (22)

  • Figure 1: MCD discovers multiple causal graphs from time-series data by determining the mixture component membership for each sample and inferring one graph per mixture component.
  • Figure 2: Probabilistic graphical model diagram of a mixture of SCMs. Shaded circles are observed and hollow circles are latent.
  • Figure 3: Overview of how the ELBO from Eq. \ref{['eqn:elbo']} is calculated for (left) MCD-Linear, and (right) MCD-Nonlinear. Given time-series data $\left\{ X_{t-L}^{1:D, (n)}\right\}_{n=1}^N$, and a DAG sample $\mathcal{G}_{1:K}$ from the variational distribution $q_\phi(\mathcal{M}_{1:K})$, we calculate the likelihood of the data under all the $K$ causal models. The likelihood is weighted by the mixing probabilities $\left\{ r_\psi \left( Z^{(n)} \mid X^{(n)}\right) \right\}_{n=1}^N$ to calculate the ELBO.
  • Figure 4: Results on the nonlinear (top) and linear (bottom) synthetic datasets for dimension $D=5, 10, 20$. We report the orientation F1 scores. (-s) indicates that the baseline predicts one graph per sample. (-g) signifies that the baseline was executed on samples grouped according to the ground truth causal graph. These methods use additional information that MCD does not. Average of 5 runs reported.
  • Figure 5: Clustering accuracy of (left) MCD-Nonlinear on the nonlinear synthetic datasets (right) MCD-Linear on the linear synthetic datasets, as a function of the true number of causal graphs $K^*$. The accuracy is averaged across 5 runs and data dimensionality $D=5, 10, 20$. Hyperparameter $K$ is set to $2K^\ast$ for all settings.
  • ...and 17 more figures

Theorems & Definitions (15)

  • Proposition 1
  • Theorem 2: Identifiability of linear SVARs with equal-variance additive Gaussian noise
  • Theorem 3: Identifiability of finite mixture of causal models
  • Proposition 1
  • proof
  • Definition 1: Identifiability
  • Definition 2: Identifiability of finite mixtures
  • Theorem A: Identifiability of finite mixtures of distributions yakowitz1968identifiability
  • Proposition A: Identifiability of mixture of multivariate Gaussian distributions yakowitz1968identifiability
  • Theorem B: Identifiability of linear SCMs with equal-variance additive Gaussian noise
  • ...and 5 more