Table of Contents
Fetching ...

Modeling Time-evolving Causality over Data Streams

Naoki Chihara, Yasuko Matsubara, Ren Fujiwara, Yasushi Sakurai

TL;DR

The proposed model outperforms state-of-the-art methods in terms of discovering the time-evolving causality as well as forecasting and does not depend on data stream length and thus is applicable to very large sequences.

Abstract

Given an extensive, semi-infinite collection of multivariate coevolving data sequences (e.g., sensor/web activity streams) whose observations influence each other, how can we discover the time-changing cause-and-effect relationships in co-evolving data streams? How efficiently can we reveal dynamical patterns that allow us to forecast future values? In this paper, we present a novel streaming method, ModePlait, which is designed for modeling such causal relationships (i.e., time-evolving causality) in multivariate co-evolving data streams and forecasting their future values. The solution relies on characteristics of the causal relationships that evolve over time in accordance with the dynamic changes of exogenous variables. ModePlait has the following properties: (a) Effective: it discovers the time-evolving causality in multivariate co-evolving data streams by detecting the transitions of distinct dynamical patterns adaptively. (b) Accurate: it enables both the discovery of time-evolving causality and the forecasting of future values in a streaming fashion. (c) Scalable: our algorithm does not depend on data stream length and thus is applicable to very large sequences. Extensive experiments on both synthetic and real-world datasets demonstrate that our proposed model outperforms state-of-the-art methods in terms of discovering the time-evolving causality as well as forecasting.

Modeling Time-evolving Causality over Data Streams

TL;DR

The proposed model outperforms state-of-the-art methods in terms of discovering the time-evolving causality as well as forecasting and does not depend on data stream length and thus is applicable to very large sequences.

Abstract

Given an extensive, semi-infinite collection of multivariate coevolving data sequences (e.g., sensor/web activity streams) whose observations influence each other, how can we discover the time-changing cause-and-effect relationships in co-evolving data streams? How efficiently can we reveal dynamical patterns that allow us to forecast future values? In this paper, we present a novel streaming method, ModePlait, which is designed for modeling such causal relationships (i.e., time-evolving causality) in multivariate co-evolving data streams and forecasting their future values. The solution relies on characteristics of the causal relationships that evolve over time in accordance with the dynamic changes of exogenous variables. ModePlait has the following properties: (a) Effective: it discovers the time-evolving causality in multivariate co-evolving data streams by detecting the transitions of distinct dynamical patterns adaptively. (b) Accurate: it enables both the discovery of time-evolving causality and the forecasting of future values in a streaming fashion. (c) Scalable: our algorithm does not depend on data stream length and thus is applicable to very large sequences. Extensive experiments on both synthetic and real-world datasets demonstrate that our proposed model outperforms state-of-the-art methods in terms of discovering the time-evolving causality as well as forecasting.

Paper Structure

This paper contains 23 sections, 3 theorems, 16 equations, 6 figures, 5 tables, 2 algorithms.

Key Result

Lemma 1

The time complexity of RegimeCreation is $O(N(d^2+h^2)+k^3)$, where $k=\max_i(k_i)$. Please see Appendix section:app:algorithm for details.

Figures (6)

  • Figure 1: Modeling power of ModePlait over an epidemiological data stream (i.e., #1 covid19): This original stream consists of daily COVID-19 infection numbers in five major countries. Our proposed method can (a) discover the causal relationships, which change over time, (b) extract the eigenvalues of the latent dynamics providing insight into them in terms of decay rate and temporal frequency, and (c) forecast future value in a stream fashion.
  • Figure 2: Illustration of ModePlait: (a) we extract the latent temporal dynamics from the $i$-th univariate inherent signal $\bm{e}_{(i)}$, which behaves as a dynamical system. (b) The multivariate time series is described by mixing matrix $\bm{W}^{-1}$ and a collection of $d$ self-dynamics factor sets $\{ \mathcal{D}_{(1)} , ..., \mathcal{D}_{(d)} \}$. The mixing matrix $\bm{W}^{-1}$ is not the same matrix as the causal adjacency matrix $\bm{B}$, it is instrumental in identifying the time-evolving causality.
  • Figure 5: Overview of ModePlait algorithm: Given a data stream $\bm{X}$, it performs all the following tasks at every time point $t_c$. Firstly, it searches for the best regime $\boldsymbol{\theta}^c$ for the current window $\bm{X}^c$. It then forecasts the $l_s$-steps-ahead future value, i.e., $\bm{v}(t_c+l_s)$ by utilizing the best one. When the algorithm encounters an unknown pattern in $\bm{X}^c$, it estimates a new regime $\boldsymbol{\theta}$ and inserts it into $\boldsymbol{\Theta}$.
  • Figure 6: Scalability of ModePlait: (left) Wall clock time vs. data stream length $t_c$ and (right) average time consumption for (#4) exercise. The vertical axis of these graphs is a logarithmic scale. ModePlait is superior to its competitors. It is up to 1,500x faster than its competitors.
  • Figure : ModePlait($\bm{x}(t_c), \mathcal{F}, \mathcal{C}$)
  • ...and 1 more figures

Theorems & Definitions (10)

  • Definition 1: Inherent signals: $\bm{E}$
  • Definition 2: Self-dynamics factor set: $\mathcal{D}_{(i)}$
  • Definition 3: Single regime parameter set: $\boldsymbol{\theta}$
  • Definition 4: Regime set: $\boldsymbol{\Theta}$
  • Definition 5: Time-evolving causality: $\mathcal{B}$
  • Lemma 1: Time complexity of RegimeCreation
  • Definition 6: Update parameter: $\boldsymbol{\omega}$
  • Definition 7: Full parameter set: $\mathcal{F}$
  • Lemma 2: Causal identifiability
  • Lemma 3: Time complexity of ModePlait