Table of Contents
Fetching ...

Causal Discovery in Semi-Stationary Time Series

Shanyun Gao, Raghavendra Addanki, Tong Yu, Ryan A. Rossi, Murat Kocaoglu

TL;DR

This work tackles causal discovery from observational multivariate time series under non-stationarity by focusing on semi-stationary processes with periodically repeating causal mechanisms. It introduces PCMCI$_{\Omega}$, a non-parametric, constraint-based extension of PCMCI that searches over candidate periods up to $\omega_{\text{ub}}$, partitions time into $\Pi^{j}_{k}$, and performs conditional independence tests within partitions to recover the true causal graph while identifying the underlying periodicities $\omega_j$ and the global period $\Omega=\mathrm{LCM}(\{\omega_j\})$. The authors prove soundness under standard causal assumptions A1–A7 and provide lemmas ensuring that the method recovers true parents from a potentially denser CI-derived set and that the periodic structure can be identified in the limit of infinite data. Empirical results on continuous and discrete data, plus a climate case study, demonstrate the method's ability to detect periodic causal mechanisms and relax the stationary assumption, with code and reproducible experiments. Overall, PCMCI$_{\Omega}$ offers a principled, non-parametric approach for discovering causal structure in time series where the mechanism changes recur periodically, broadening applicability to real-world domains with seasonality and diurnal variation.

Abstract

Discovering causal relations from observational time series without making the stationary assumption is a significant challenge. In practice, this challenge is common in many areas, such as retail sales, transportation systems, and medical science. Here, we consider this problem for a class of non-stationary time series. The structural causal model (SCM) of this type of time series, called the semi-stationary time series, exhibits that a finite number of different causal mechanisms occur sequentially and periodically across time. This model holds considerable practical utility because it can represent periodicity, including common occurrences such as seasonality and diurnal variation. We propose a constraint-based, non-parametric algorithm for discovering causal relations in this setting. The resulting algorithm, PCMCI$_Ω$, can capture the alternating and recurring changes in the causal mechanisms and then identify the underlying causal graph with conditional independence (CI) tests. We show that this algorithm is sound in identifying causal relations on discrete time series. We validate the algorithm with extensive experiments on continuous and discrete simulated data. We also apply our algorithm to a real-world climate dataset.

Causal Discovery in Semi-Stationary Time Series

TL;DR

This work tackles causal discovery from observational multivariate time series under non-stationarity by focusing on semi-stationary processes with periodically repeating causal mechanisms. It introduces PCMCI, a non-parametric, constraint-based extension of PCMCI that searches over candidate periods up to , partitions time into , and performs conditional independence tests within partitions to recover the true causal graph while identifying the underlying periodicities and the global period . The authors prove soundness under standard causal assumptions A1–A7 and provide lemmas ensuring that the method recovers true parents from a potentially denser CI-derived set and that the periodic structure can be identified in the limit of infinite data. Empirical results on continuous and discrete data, plus a climate case study, demonstrate the method's ability to detect periodic causal mechanisms and relax the stationary assumption, with code and reproducible experiments. Overall, PCMCI offers a principled, non-parametric approach for discovering causal structure in time series where the mechanism changes recur periodically, broadening applicability to real-world domains with seasonality and diurnal variation.

Abstract

Discovering causal relations from observational time series without making the stationary assumption is a significant challenge. In practice, this challenge is common in many areas, such as retail sales, transportation systems, and medical science. Here, we consider this problem for a class of non-stationary time series. The structural causal model (SCM) of this type of time series, called the semi-stationary time series, exhibits that a finite number of different causal mechanisms occur sequentially and periodically across time. This model holds considerable practical utility because it can represent periodicity, including common occurrences such as seasonality and diurnal variation. We propose a constraint-based, non-parametric algorithm for discovering causal relations in this setting. The resulting algorithm, PCMCI, can capture the alternating and recurring changes in the causal mechanisms and then identify the underlying causal graph with conditional independence (CI) tests. We show that this algorithm is sound in identifying causal relations on discrete time series. We validate the algorithm with extensive experiments on continuous and discrete simulated data. We also apply our algorithm to a real-world climate dataset.
Paper Structure (25 sections, 9 theorems, 66 equations, 9 figures, 1 table, 4 algorithms)

This paper contains 25 sections, 9 theorems, 66 equations, 9 figures, 1 table, 4 algorithms.

Key Result

Theorem 3.1

Let $\hat{\mathcal{G}}$ be the estimated graph using the Algorithm $\text{PCMCI}_\Omega$. Under assumptions A1-A7 and with an oracle (infinite sample size limit), we have that: almost surely.

Figures (9)

  • Figure 1: Partial causal graph for 3-variate time series $V=\{\mathbf{X}^{1}, \mathbf{X}^{2}, \mathbf{X}^{3}\}$ with a Semi-Stationary SCM where $\tau_{\max}=3$, $\omega_{1}=3$, $\omega_{2}=2$, $\omega_{3}=1$, $\Omega=6$ and $\delta=6$. The first 3(=$\tau_{\max}$) time slices $\{\mathbf{X}_{t}\}_{1\leq t \leq 3}$ are the starting points. The same color edges represent the same causal mechanism. E.g. for $\mathbf{X}^{1}$: there are 3 ($=\omega_{1}$) time partition subsets $\{\Pi^{1}_{k}\}_{1\leq k \leq 3}$. The time points $t$ of nodes $X^{1}_{t}$ sharing the same filling color are in the same time partition subsets. The time points $t$ of nodes $X^{1}_{t}$ sharing both the same filling color and the same outline shape are in the same homogenous time partition subsets (the definitions are in the supplementary material). There are 6 ($=\delta$) different Markov chains in this multivariate time series $V$, and the first element of these 6 Markov chains is shown as $\{Z^{q}_{1}\}_{1\leq q\leq 6}$ and are tinted with a gradient of blue hues. The superscript $q$ of $Z^{q}_{i}$ is the index of different Markov chains, whereas the subscript $i$ denotes the running index of that specific Markov chain. For instance, $Z^{1}_{1}$ and $Z^{1}_{2}$ denote the first two elements of the first Markov chain, while $Z^{2}_{1}$ and $Z^{2}_{2}$ denote the first two elements of the second Markov chain.
  • Figure 2: $\text{PCMCI}_\Omega$ is tested on 5-variate time series with $\tau_{\max}=5$. Set $\tau_\text{ub}=15, \omega_{\text{ub}}=15$ for all variables. Every line corresponds to a different time series length. Every marker corresponds to the average accuracy rate or average running time over 100 trials. a) The accuracy rate of $\hat{\omega}$ for different time series lengths and different $\omega_{\max}$. b) Illustration of Runtime (in sec.) when $\omega_{\max}$ varies.
  • Figure 3: 4 algorithms are tested on 5-variate time series. Set $\tau_\text{ub}=15, \omega_{ub}=15$ for all variables. Every line corresponds to a different algorithm. Every marker corresponds to the average performance over 100 trials.
  • Figure 4: Set $\tau_{\text{ub}}$ to be 5, then all parent candidates of variables at $t=15$ are included in the large orange box, ranging from $t=10$ to $t=14$. Consequently, the algorithm will only examine causal effects with a time lag not exceeding 5. In the causal graph, $\tau_{\max}$ is 3, representing the maximum time lag observed among the 3-variate time series. Specifically, the maximum time lag for each component time series is $\tau_1=2, \tau_2=3, \tau_3=1$, respectively, and $\tau_{\max}$ represents the largest value among these three maximum lags.
  • Figure 5: Partial causal graph for 3-variate time series $V=\{\mathbf{X}^{1}, \mathbf{X}^{2}, \mathbf{X}^{3}\}$ with a Semi-Stationary SCM where $\tau_{\max}=3$, $\omega_{1}=3$, $\omega_{2}=2$, $\omega_{3}=1$, $\Omega=6$ and $\delta=6$. The first 3(=$\tau_{\max}$) time slices $\{\mathbf{X}_{t}\}_{1\leq t \leq 3}$ are the starting points. The same color edges denote the same causal mechanism. E.g. for $\mathbf{X}^{1}$: there are 3 ($=\omega_{j}$) time partition subsets $\{\Pi^{1}_{k}\}_{1\leq k \leq 3}$. The time points $t$ of nodes $X^{1}_{t}$ sharing the same filling color are in the same time partition subsets. The time points $t$ of nodes $X^{1}_{t}$ sharing both the same filling color and the same outline shape are in the same homogenous time partition subsets. There are 6 ($=\delta$) different Markov chains in this multivariate time series $V$, and the first element of these 6 Markov chains is shown as $\{Z^{q}_{1}\}_{1\leq q\leq 6}$ and are tinted with a gradient of blue hues. $Z^{1}_{1}$ and $Z^{1}_{2}$ denote the first two elements of the first Markov chain while $Z^{2}_{1}$ and $Z^{2}_{2}$ denote the first two elements of the second Markov chain.
  • ...and 4 more figures

Theorems & Definitions (24)

  • Definition 2.1: Non-Stationary SCM
  • Definition 2.2: Semi-Stationary SCM
  • Definition 2.3: Time Partition
  • Definition 2.4: Illusory Parent Sets
  • Definition 2.5: Time Series as a Markov Chain
  • Theorem 3.1
  • Lemma 3.2
  • proof : Proof sketch
  • Lemma 3.3
  • proof : Proof sketch
  • ...and 14 more