Table of Contents
Fetching ...

On the Three Demons in Causality in Finance: Time Resolution, Nonstationarity, and Latent Factors

Xinshuai Dong, Haoyue Dai, Yewen Fan, Songyao Jin, Sathyamoorthy Rajendran, Kun Zhang

TL;DR

The paper tackles three demons of causality in finance—time-resolution mismatch, nonstationarity, and latent factors—by developing a cohesive causal framework. It shows that temporally aggregated data can reveal instantaneous causal structure under a linear VAR dynamics, using $\mathbf{X}_t = A\mathbf{X}_{t-1} + \mathbf{e}_t$ and $\tilde{\mathbf{X}}_t = \frac{1}{k}\sum_{i=1}^k \mathbf{X}_{i+(t-1)k}$ with $\tilde{\mathbf{e}}_t = \frac{1}{k}\sum_{i=1}^k \mathbf{e}_{i+(t-1)k}$. It then introduces CD-NOD to exploit nonstationarity (concept drift) for causal discovery via a time-surrogate and modularity, and a rank-based latent causal discovery method to recover latent structures and estimate edge coefficients using a latent linear model and GIN conditions. The approach is validated on SP100 stock data across periods (e.g., 2017, 2019, 2021), identifying changing causal modules, driving forces, and sector-based latent clusters, supporting the feasibility of causal interventions using observational data. Overall, the work provides a principled foundation for causality-guided finance analytics with practical procedures for time-resolved, nonstationary, and latent-factor-rich settings, along with concrete steps for inference and potential policy-relevant interventions.

Abstract

Financial data is generally time series in essence and thus suffers from three fundamental issues: the mismatch in time resolution, the time-varying property of the distribution - nonstationarity, and causal factors that are important but unknown/unobserved. In this paper, we follow a causal perspective to systematically look into these three demons in finance. Specifically, we reexamine these issues in the context of causality, which gives rise to a novel and inspiring understanding of how the issues can be addressed. Following this perspective, we provide systematic solutions to these problems, which hopefully would serve as a foundation for future research in the area.

On the Three Demons in Causality in Finance: Time Resolution, Nonstationarity, and Latent Factors

TL;DR

The paper tackles three demons of causality in finance—time-resolution mismatch, nonstationarity, and latent factors—by developing a cohesive causal framework. It shows that temporally aggregated data can reveal instantaneous causal structure under a linear VAR dynamics, using and with . It then introduces CD-NOD to exploit nonstationarity (concept drift) for causal discovery via a time-surrogate and modularity, and a rank-based latent causal discovery method to recover latent structures and estimate edge coefficients using a latent linear model and GIN conditions. The approach is validated on SP100 stock data across periods (e.g., 2017, 2019, 2021), identifying changing causal modules, driving forces, and sector-based latent clusters, supporting the feasibility of causal interventions using observational data. Overall, the work provides a principled foundation for causality-guided finance analytics with practical procedures for time-resolved, nonstationary, and latent-factor-rich settings, along with concrete steps for inference and potential policy-relevant interventions.

Abstract

Financial data is generally time series in essence and thus suffers from three fundamental issues: the mismatch in time resolution, the time-varying property of the distribution - nonstationarity, and causal factors that are important but unknown/unobserved. In this paper, we follow a causal perspective to systematically look into these three demons in finance. Specifically, we reexamine these issues in the context of causality, which gives rise to a novel and inspiring understanding of how the issues can be addressed. Following this perspective, we provide systematic solutions to these problems, which hopefully would serve as a foundation for future research in the area.
Paper Structure (16 sections, 4 theorems, 7 equations, 8 figures)

This paper contains 16 sections, 4 theorems, 7 equations, 8 figures.

Key Result

Theorem 4

Given two sets of variables $\mathbf{A}$ and $\mathbf{B}$ from a linear model with $\mathcal{G}$, $\text{rank}(\Sigma_{\mathbf{A},{\mathbf{B}}}) = \min \{|\mathbf{C}_{\mathbf{A}}|+ |\mathbf{C}_{\mathbf{B}}|:(\mathbf{C}_{\mathbf{A}},\mathbf{C}_{\mathbf{B}})~ \\ \text{t-separates}~\mathbf{A}~\text{fro

Figures (8)

  • Figure 1: Example to illustrate the nonstationary relationship between Pfizer Inc. (PFE) and The Boeing Company (BA). The upper row: five scatter plots correspond to years from 2019 to 2023. The scatters are the log daily returns of the two stocks. Clearly they have different dependence patterns. The linear regression line and the correlation are annotated. The lower row: daily prices series of the two stocks are plotted for reference.
  • Figure 2: An illustrative example of the key process of our rank-based latent causal discovery algorithm.
  • Figure 3: CD-NOD discovered causal graph on the ten selected stocks from SP100 using data from 2019 to 2023. The top "DATE" is the date index serving as a surrogate for nonstationarity in the dataset. Variables that are direct child of "DATE" are those whose causal generating mechanisms (i.e., the conditional distribution of this variable give its other stock parental variables) are subject to changes over time.
  • Figure 4: Visualization of estimated driving forces of changing causal modules using CD-NOD's Phase III. Each row indicates one stock whose causal module changes over time. The left panel shows the so-called "driving force", i.e., the top-2 principle nonstationary components recovered by Kernel Nonstationary Visualization (KNV). Note that the lines primarily indicate "change points" in trends; they do not represent actual prices, nor do they carry a direct physical interpretation on the y-axis. The right panel shows the largest ten eigenvalues of the corresponding Gram matrix.
  • Figure 5: Baysian changing point detection result for SP100 from 2008 to 2023.
  • ...and 3 more figures

Theorems & Definitions (11)

  • Example 1
  • Definition 1
  • Example 2
  • Definition 2: Treks sullivant2010trek
  • Definition 3: T-separation sullivant2010trek
  • Theorem 4: Rank and T-separation sullivant2010trek
  • Definition 5: Atomic Cover
  • Theorem 7: Uniqueness of Rank Deficiency
  • Example 3
  • Theorem 9: Identifiability of Latent Causal Graphs
  • ...and 1 more