Table of Contents
Fetching ...

DoFlow: Causal Generative Flows for Interventional and Counterfactual Time-Series Prediction

Dongze Wu, Feng Qiu, Yao Xie

TL;DR

DoFlow tackles the challenge of causal time-series forecasting by marrying a causal DAG with time-conditioned continuous normalizing flows, enabling observational, interventional, and counterfactual trajectory generation within a single probabilistic framework. It introduces an autoregressive, node-specific CNF conditioned on past histories and parent histories, trained with conditional flow matching, and supports abduction-action-prediction for counterfactuals along with explicit likelihoods for anomaly detection. Theoretical results establish a counterfactual recovery property under monotone SCM assumptions, while experiments across synthetic DAGs and real domains (hydropower, cancer treatment) demonstrate strong forecasting, credible causal query responses, and practical anomaly signals. This framework advances trustworthy causal inference in dynamical systems and suggests directions toward digital twins and physics-informed extensions that reason under interventions and uncertainty.

Abstract

Time-series forecasting increasingly demands not only accurate observational predictions but also causal forecasting under interventional and counterfactual queries in multivariate systems. We present DoFlow, a flow based generative model defined over a causal DAG that delivers coherent observational and interventional predictions, as well as counterfactuals through the natural encoding and decoding mechanism of continuous normalizing flows (CNFs). We also provide a supporting counterfactual recovery result under certain assumptions. Beyond forecasting, DoFlow provides explicit likelihoods of future trajectories, enabling principled anomaly detection. Experiments on synthetic datasets with various causal DAG and real world hydropower and cancer treatment time series show that DoFlow achieves accurate system-wide observational forecasting, enables causal forecasting over interventional and counterfactual queries, and effectively detects anomalies. This work contributes to the broader goal of unifying causal reasoning and generative modeling for complex dynamical systems.

DoFlow: Causal Generative Flows for Interventional and Counterfactual Time-Series Prediction

TL;DR

DoFlow tackles the challenge of causal time-series forecasting by marrying a causal DAG with time-conditioned continuous normalizing flows, enabling observational, interventional, and counterfactual trajectory generation within a single probabilistic framework. It introduces an autoregressive, node-specific CNF conditioned on past histories and parent histories, trained with conditional flow matching, and supports abduction-action-prediction for counterfactuals along with explicit likelihoods for anomaly detection. Theoretical results establish a counterfactual recovery property under monotone SCM assumptions, while experiments across synthetic DAGs and real domains (hydropower, cancer treatment) demonstrate strong forecasting, credible causal query responses, and practical anomaly signals. This framework advances trustworthy causal inference in dynamical systems and suggests directions toward digital twins and physics-informed extensions that reason under interventions and uncertainty.

Abstract

Time-series forecasting increasingly demands not only accurate observational predictions but also causal forecasting under interventional and counterfactual queries in multivariate systems. We present DoFlow, a flow based generative model defined over a causal DAG that delivers coherent observational and interventional predictions, as well as counterfactuals through the natural encoding and decoding mechanism of continuous normalizing flows (CNFs). We also provide a supporting counterfactual recovery result under certain assumptions. Beyond forecasting, DoFlow provides explicit likelihoods of future trajectories, enabling principled anomaly detection. Experiments on synthetic datasets with various causal DAG and real world hydropower and cancer treatment time series show that DoFlow achieves accurate system-wide observational forecasting, enables causal forecasting over interventional and counterfactual queries, and effectively detects anomalies. This work contributes to the broader goal of unifying causal reasoning and generative modeling for complex dynamical systems.

Paper Structure

This paper contains 42 sections, 3 theorems, 59 equations, 13 figures, 8 tables, 2 algorithms.

Key Result

Proposition 3.1

Given base samples $z_{\tau+1:T}\sim q(\cdot)$, the log-density of the generated time series obtained via the continuous normalizing flow is:

Figures (13)

  • Figure 1: (A) RNN State Update. (B) Observational/Interventional Forecasting. Forecasts are generated by decoding from latent $z_{i,t}\sim N(0,1)$, conditioned on $\hat{H}_{i,t-1}$ updated with the last predicted $(\hat{x}_{i,t-1},\hat{x}_{\mathrm{pa}(i),t-1})$. (C) A factual observation $x_{i,t}^{\mathrm{F}}$ is encoded with its factual state $H_{i,t}^{\mathrm{F}}$ into $z_{i,t}^{\mathrm{F}}$, then decoded under the counterfactual state $\hat{H}_{i,t-1}^{\mathrm{CF}}$ to yield $\hat{x}_{i,t}^{\mathrm{CF}}$. Factual states $H_{i,t-1}^{\mathrm{F}}$ are updated from observed $x_{i,t-1}^{\mathrm{F}}$, while counterfactual states $\hat{H}_{i,t-1}^{\mathrm{CF}}$ are updated from the previously generated $\hat{x}_{i,t-1}^{\mathrm{CF}}$.
  • Figure 2: Left: "Layer" interventional forecasting results. Nodes $X_{1,t}$, $X_{2,t}$, and $X_{3,t}$ are intervened. DoFlow provides 50% and 90% prediction intervals; the orange lines indicate the true interventional future. Right: "Tree" counterfactual forecasting results. Node $X_{1,t}$ is intervened. DoFlow provides a single forecast in green; the orange lines indicate the true counterfactual future.
  • Figure 3: Hydropower -- Interventional.
  • Figure 4: Tree graph over 8 nodes. Exogenous variables $U_{i,t}$ are omitted for clarity but exist for every node at each time $t$. Left: Full node-level causal structure between consecutive time, with all variables $\{X_{1,t}, \dots, X_{8,t}\}$ present at each step. Right: Rolled-up (time-suppressed) view over different nodes $\{X_{1},\dots,X_{8}\}$. Each arrow $X_i \to X_j$ (with $i \neq j$) denotes a lag-1 temporal dependency $X_{i,t-1} \to X_{j,t}$ that holds for all $t$. Both panels depict the same underlying structure.
  • Figure 5: Diamond graph over 10 nodes. Exogenous variables $U_{i,t}$ are omitted for clarity but exist for every node at each time $t$. Left: Full node-level causal structure between consecutive time, with all variables $\{X_{1,t}, \dots, X_{10,t}\}$ present at each step. Right: Rolled-up (time-suppressed) view over different nodes $\{X_{1},\dots,X_{10}\}$. Each arrow $X_i \to X_j$ (with $i \neq j$) denotes a lag-1 temporal dependency $X_{i,t-1} \to X_{j,t}$ that holds for all $t$. Both panels depict the same underlying structure.
  • ...and 8 more figures

Theorems & Definitions (5)

  • Proposition 3.1
  • Remark 4.2
  • Proposition 4.3: Encoded as a function of the exogenous noise $U_t$
  • Remark 4.4
  • Corollary 4.5: Counterfactual recovery