Table of Contents
Fetching ...

CAIFormer: A Causal Informed Transformer for Multivariate Time Series Forecasting

Xingyu Zhang, Wenwen Qiang, Siyu Zhao, Huijie Guo, Jiangmeng Li, Chuxiong Sun, Changwen Zheng

TL;DR

This work addresses multivariate time series forecasting by distinguishing causal roles among historical variables instead of treating all histories equally. It introduces an all-to-one paradigm and CAIFormer, a causal-informed Transformer that partitions each target's history into Endogenous Sub-segment, Direct Causal Sub-segment, and Collider Causal Sub-segment, discarding Spurious Correlation Sub-segments. CAIFormer employs three blocks—ESPB, DCSPB, and CCSPB—and uses DAG-guided masks derived from the PC algorithm to separate intrinsic dynamics, direct causal influences, and collider-driven dependencies, with a collider constraint implemented via a projection to reduce generalization gaps. Empirical results on six real-world datasets with horizons up to $S=720$ demonstrate improved accuracy and robustness over strong baselines, with ablations confirming the contribution of each component and the benefit of leveraging learned causal structure for attention guidance.

Abstract

Most existing multivariate time series forecasting methods adopt an all-to-all paradigm that feeds all variable histories into a unified model to predict their future values without distinguishing their individual roles. However, this undifferentiated paradigm makes it difficult to identify variable-specific causal influences and often entangles causally relevant information with spurious correlations. To address this limitation, we propose an all-to-one forecasting paradigm that predicts each target variable separately. Specifically, we first construct a Structural Causal Model from observational data and then, for each target variable, we partition the historical sequence into four sub-segments according to the inferred causal structure: endogenous, direct causal, collider causal, and spurious correlation. The prediction relies solely on the first three causally relevant sub-segments, while the spurious correlation sub-segment is excluded. Furthermore, we propose Causal Informed Transformer (CAIFormer), a novel forecasting model comprising three components: Endogenous Sub-segment Prediction Block, Direct Causal Sub-segment Prediction Block, and Collider Causal Sub-segment Prediction Block, which process the endogenous, direct causal, and collider causal sub-segments, respectively. Their outputs are then combined to produce the final prediction. Extensive experiments on multiple benchmark datasets demonstrate the effectiveness of the CAIFormer.

CAIFormer: A Causal Informed Transformer for Multivariate Time Series Forecasting

TL;DR

This work addresses multivariate time series forecasting by distinguishing causal roles among historical variables instead of treating all histories equally. It introduces an all-to-one paradigm and CAIFormer, a causal-informed Transformer that partitions each target's history into Endogenous Sub-segment, Direct Causal Sub-segment, and Collider Causal Sub-segment, discarding Spurious Correlation Sub-segments. CAIFormer employs three blocks—ESPB, DCSPB, and CCSPB—and uses DAG-guided masks derived from the PC algorithm to separate intrinsic dynamics, direct causal influences, and collider-driven dependencies, with a collider constraint implemented via a projection to reduce generalization gaps. Empirical results on six real-world datasets with horizons up to demonstrate improved accuracy and robustness over strong baselines, with ablations confirming the contribution of each component and the benefit of leveraging learned causal structure for attention guidance.

Abstract

Most existing multivariate time series forecasting methods adopt an all-to-all paradigm that feeds all variable histories into a unified model to predict their future values without distinguishing their individual roles. However, this undifferentiated paradigm makes it difficult to identify variable-specific causal influences and often entangles causally relevant information with spurious correlations. To address this limitation, we propose an all-to-one forecasting paradigm that predicts each target variable separately. Specifically, we first construct a Structural Causal Model from observational data and then, for each target variable, we partition the historical sequence into four sub-segments according to the inferred causal structure: endogenous, direct causal, collider causal, and spurious correlation. The prediction relies solely on the first three causally relevant sub-segments, while the spurious correlation sub-segment is excluded. Furthermore, we propose Causal Informed Transformer (CAIFormer), a novel forecasting model comprising three components: Endogenous Sub-segment Prediction Block, Direct Causal Sub-segment Prediction Block, and Collider Causal Sub-segment Prediction Block, which process the endogenous, direct causal, and collider causal sub-segments, respectively. Their outputs are then combined to produce the final prediction. Extensive experiments on multiple benchmark datasets demonstrate the effectiveness of the CAIFormer.

Paper Structure

This paper contains 25 sections, 1 theorem, 10 equations, 7 figures, 4 tables, 2 algorithms.

Key Result

Theorem 3.1

(Generalization Gap Reduction) For any predictor $f \in L^2(V)$, we can obtain $\Delta(f, \Psi f) = \| \Phi f \|_{L^2(V)}^2 \geq 0$, where $\Delta(f, \Psi f)$ denotes the generalization gap, which is defined by $\Delta (f,\Psi f) = \mathbb{E}[{(V_i - f(\mathcal{S}_{V_c},\mathcal{S}_{V_s}))^2}] - \ma

Figures (7)

  • Figure 1: (a)-(c) Visualization of Granger causality across variables in ETTh1, ETTm1, and Exchange datasets. Each heatmap shows the transformed causal strength matrix using $-\log(P)$ values, where a darker color indicates a stronger causal influence from the row variable to the column variable. Diagonal entries are masked. (d) Representative partial SCM commonly encountered in MTSF. White nodes represent the target variable $V_i$, red and green nodes represent causally related variables and spurious correlated variables separately.
  • Figure 2: Visualization of the CAIFormer. (a) depicts the overall architecture, consisting of ESPB, DCSPB, and CCSPB, three Blocks. (b) illustrates the Encoder structure, featuring Multi-Patch Attention in ESPB and Multi-variate Attention in both DCSPB and CCSPB. (c) shows the causal discovery on a dataset. (d) demonstrates how we impose constraints on Multi-variate Attention.
  • Figure 3: Effect of $\Psi$ Projection on Train/Test Loss
  • Figure 4: Visualization of causal DAGs discovered by the PC algorithm across different datasets. Directed edges indicate inferred causal relationships between variables, while undirected edges indicate uncertainty regarding causal direction. The results cover six datasets: (a) Weather,(b) Exchange, (c) ETTh1, (d) ETTh2, (e) ETTm1, and (f) ETTm2.
  • Figure 5: Visualization of the masks constructed from the DAG discovered by the PC algorithm on the ETTh1, ETTh2, and weather datasets.
  • ...and 2 more figures

Theorems & Definitions (6)

  • Theorem 3.1
  • Definition D.1: Conditional Independence dawid1979conditional
  • Definition D.2: Chain
  • Definition D.3: Fork
  • Definition D.4: Collider
  • proof