Table of Contents
Fetching ...

Granger Causality using Neural Networks

Malik Shahid Sultan, Samuel Horvath, Hernando Ombao

TL;DR

This work addresses the challenge of inferring Granger causality in nonlinear, high-dimensional time series by introducing Learned Kernel VAR (LeKVAR), where a neural network learns a shared kernel $\xi_{\theta}(\cdot)$ and GC structure is read from the learned cross-channel weights $A^{(h)}$. It further advances GC estimation with a decoupled lag–time series penalties framework that separates lag relevance and component importance, enabling direct lag extraction during GC inference. A key technical contribution is addressing a degenerate learning objective with a weight-normalization reparameterization and integrating penalties into standard deep-learning optimizers, allowing mini-batching and compatibility with RNNs and transformers. Experiments on VAR(3), Lorenz-96, and Netsim fMRI demonstrate competitive GC recovery and favorable computation times, illustrating the practical viability and scalability of NN-based GC with decoupled lag inference for neuroscience and complex dynamical systems. The framework is broadly extensible to various neural architectures and provides a path toward more interpretable, data-driven GC discovery in nonlinear multivariate time series.

Abstract

Dependence between nodes in a network is an important concept that pervades many areas including finance, politics, sociology, genomics and the brain sciences. One way to characterize dependence between components of a multivariate time series data is via Granger Causality (GC). Standard traditional approaches to GC estimation / inference commonly assume linear dynamics, however such simplification does not hold in many real-world applications where signals are inherently non-linear. In such cases, imposing linear models such as vector autoregressive (VAR) models can lead to mis-characterization of true Granger Causal interactions. To overcome this limitation, Tank et al (IEEE Transactions on Pattern Analysis and Machine Learning, 2022) proposed a solution that uses neural networks with sparse regularization penalties. The regularization encourages learnable weights to be sparse, which enables inference on GC. This paper overcomes the limitations of current methods by leveraging advances in machine learning and deep learning which have been demonstrated to learn hidden patterns in the data. We propose novel classes of models that can handle underlying non-linearity in a computationally efficient manner, simultaneously providing GC and lag order selection. Firstly, we present the Learned Kernel VAR (LeKVAR) model that learns kernel parameterized by a shared neural net followed by penalization on learnable weights to discover GC structure. Secondly, we show one can directly decouple lags and individual time series importance via decoupled penalties. This is important as we want to select the lag order during the process of GC estimation. This decoupling acts as a filtering and can be extended to any DL model including Multi-Layer Perceptrons (MLP), Recurrent Neural Networks (RNN), Long Short Term Memory Networks (LSTM), Transformers etc, for simultaneous GC estimation and lag selection.

Granger Causality using Neural Networks

TL;DR

This work addresses the challenge of inferring Granger causality in nonlinear, high-dimensional time series by introducing Learned Kernel VAR (LeKVAR), where a neural network learns a shared kernel and GC structure is read from the learned cross-channel weights . It further advances GC estimation with a decoupled lag–time series penalties framework that separates lag relevance and component importance, enabling direct lag extraction during GC inference. A key technical contribution is addressing a degenerate learning objective with a weight-normalization reparameterization and integrating penalties into standard deep-learning optimizers, allowing mini-batching and compatibility with RNNs and transformers. Experiments on VAR(3), Lorenz-96, and Netsim fMRI demonstrate competitive GC recovery and favorable computation times, illustrating the practical viability and scalability of NN-based GC with decoupled lag inference for neuroscience and complex dynamical systems. The framework is broadly extensible to various neural architectures and provides a path toward more interpretable, data-driven GC discovery in nonlinear multivariate time series.

Abstract

Dependence between nodes in a network is an important concept that pervades many areas including finance, politics, sociology, genomics and the brain sciences. One way to characterize dependence between components of a multivariate time series data is via Granger Causality (GC). Standard traditional approaches to GC estimation / inference commonly assume linear dynamics, however such simplification does not hold in many real-world applications where signals are inherently non-linear. In such cases, imposing linear models such as vector autoregressive (VAR) models can lead to mis-characterization of true Granger Causal interactions. To overcome this limitation, Tank et al (IEEE Transactions on Pattern Analysis and Machine Learning, 2022) proposed a solution that uses neural networks with sparse regularization penalties. The regularization encourages learnable weights to be sparse, which enables inference on GC. This paper overcomes the limitations of current methods by leveraging advances in machine learning and deep learning which have been demonstrated to learn hidden patterns in the data. We propose novel classes of models that can handle underlying non-linearity in a computationally efficient manner, simultaneously providing GC and lag order selection. Firstly, we present the Learned Kernel VAR (LeKVAR) model that learns kernel parameterized by a shared neural net followed by penalization on learnable weights to discover GC structure. Secondly, we show one can directly decouple lags and individual time series importance via decoupled penalties. This is important as we want to select the lag order during the process of GC estimation. This decoupling acts as a filtering and can be extended to any DL model including Multi-Layer Perceptrons (MLP), Recurrent Neural Networks (RNN), Long Short Term Memory Networks (LSTM), Transformers etc, for simultaneous GC estimation and lag selection.
Paper Structure (18 sections, 18 equations, 3 figures, 9 tables)

This paper contains 18 sections, 18 equations, 3 figures, 9 tables.

Figures (3)

  • Figure 1: In LeKVAR the input is first transformed using a shared learned kernel displayed as a small purple neural network. Then each transformed input is linearly combined to form the final output. The round output neurons ($x_{t,i}$) represent the summation operation of incoming connections. We say that the series $j$ does not Granger-cause series $i$ if outgoing weights for series $j$, shown as dark arrows connecting gray squares with the output, are zeros. We say that the lag $h$ does not Granger-cause series $i$ if outgoing weights for series $[x_{t-h,1}, x_{t-h,2}, \hdots, x_{t-h,D}]$, shown as dark arrows connecting gray squares with the output, are zeros.
  • Figure 2: A schematic for the decoupling of the lags and times series components to decide Granger causality. Each input element is first multiplied by its corresponding lag and component penalty, i.e., $x_{t-h, d}$ is multiplied by the product $v_d t_h$, displayed as a small purple neural network. Then the output is obtained as the output of the component-wise neural network model with the scaled input. We say that the series $j$ does not Granger-cause series $i$ if the corresponding $v_j$ for the component-wise neural network to predict $x_{t, i}$, shown as a blue circle, is zero. We say that the lag $h$ does not Granger-cause series $i$ if the corresponding $t_h$ for the component-wise neural network to predict $x_{t, i}$, shown as a green circle, is zero.
  • Figure 3: Comparison of learned Granger causality coefficients $t_i$'s of 10 dimensional VAR dataset using the cLSTMwF model. (Left) Ground truth lags are $1, 2$ and $3$, (right) ground truth lags are $3, 4$ and $5$. These lags information can only be extracted from $cLSTMwf$ models not from $cLSTM$