Table of Contents
Fetching ...

Using Time Structure to Estimate Causal Effects

Tom Hochsprung, Jakob Runge, Andreas Gerhardus

TL;DR

The paper develops a time-domain identifiability framework for causal effects in SVAR processes with latent confounding, avoiding the need for external instruments or negative controls. By introducing the full time graph, treks, and a key linear system $\Gamma_{R,Y_t} = \Gamma_{R,C} v$, it shows that direct causal effects are generically identifiable under concrete graphical and lag-based conditions. The main contributions include a graphical identifiability theorem, lag-based sufficiency criteria, and extensive numerical validation on synthetic and real-world electricity-market data, illustrating practical identifiability without auxiliary time series. This work provides a principled path to estimating direct (and Wright-total) causal effects in time series with latent confounding, with implications for fields spanning economics, climatology, and epidemiology where unobserved drivers are common.

Abstract

There exist several approaches for estimating causal effects in time series when latent confounding is present. Many of these approaches rely on additional auxiliary observed variables or time series such as instruments, negative controls or time series that satisfy the front- or backdoor criterion in certain graphs. In this paper, we present a novel approach for estimating direct (and via Wright's path rule total) causal effects in a time series setup which does not rely on additional auxiliary observed variables or time series. This approach assumes that the underlying time series is a Structural Vector Autoregressive (SVAR) process and estimates direct causal effects by solving certain linear equation systems made up of different covariances and model parameters. We state sufficient graphical criteria in terms of the so-called full time graph under which these linear equations systems are uniquely solvable and under which their solutions contain the to-be-identified direct causal effects as components. We also state sufficient lag-based criteria under which the previously mentioned graphical conditions are satisfied and, thus, under which direct causal effects are identifiable. Several numerical experiments underline the correctness and applicability of our results.

Using Time Structure to Estimate Causal Effects

TL;DR

The paper develops a time-domain identifiability framework for causal effects in SVAR processes with latent confounding, avoiding the need for external instruments or negative controls. By introducing the full time graph, treks, and a key linear system , it shows that direct causal effects are generically identifiable under concrete graphical and lag-based conditions. The main contributions include a graphical identifiability theorem, lag-based sufficiency criteria, and extensive numerical validation on synthetic and real-world electricity-market data, illustrating practical identifiability without auxiliary time series. This work provides a principled path to estimating direct (and Wright-total) causal effects in time series with latent confounding, with implications for fields spanning economics, climatology, and epidemiology where unobserved drivers are common.

Abstract

There exist several approaches for estimating causal effects in time series when latent confounding is present. Many of these approaches rely on additional auxiliary observed variables or time series such as instruments, negative controls or time series that satisfy the front- or backdoor criterion in certain graphs. In this paper, we present a novel approach for estimating direct (and via Wright's path rule total) causal effects in a time series setup which does not rely on additional auxiliary observed variables or time series. This approach assumes that the underlying time series is a Structural Vector Autoregressive (SVAR) process and estimates direct causal effects by solving certain linear equation systems made up of different covariances and model parameters. We state sufficient graphical criteria in terms of the so-called full time graph under which these linear equations systems are uniquely solvable and under which their solutions contain the to-be-identified direct causal effects as components. We also state sufficient lag-based criteria under which the previously mentioned graphical conditions are satisfied and, thus, under which direct causal effects are identifiable. Several numerical experiments underline the correctness and applicability of our results.

Paper Structure

This paper contains 25 sections, 13 theorems, 109 equations, 7 figures, 3 tables.

Key Result

Theorem 8

(Main identifiability result) Assume a stable SVAR process satisfying Assumptions assumption_no_instantaneous_self_edges, assumption_acyclicity and assumption_latents_have_no_observed_parents. Furthermore, assume that in the full time graph one has Furthermore, assume that Define $C:=F^{\textnormal{obs}}\cup \textnormal{pa}^{\textnormal{obs}}(F^{\textnormal{obs}}) \cup \textnormal{pa}^{\textnorm

Figures (7)

  • Figure 1: Example full time graph for Examples \ref{['example_lag_notation']}, \ref{['ex_treks']} and \ref{['example1']}. The different colors and hatchings are only relevant for Example \ref{['example1']}. For Example \ref{['example1']}: The red edge corresponds to the parameter $A^{(3)}_{YY}$ that one wants to identify. The blue vertex is the only element of $B_U$ and the yellow vertex is the only element of $F^{\textnormal{obs}}$. The vertices with vertical hatching are elements of $C$ and the vertices with horizontal hatchings are elements of $R$ (vertices with grid hatchings are both in $R$ and $C$).
  • Figure 2: A numerical validation of Example \ref{['example1']}. For $1000$ different parameters that yield a stable SVAR process inducing the full time graph from Figure \ref{['example_graph_1']} and time series lengths $T\in\{10^2, 10^3, 10^4, 10^5\}$, we plot the error of the estimate of $A^{(3)}_{YY}$ from equation \ref{['est_ex_1']} to the true $A^{(3)}_{YY}$. Remark: In these boxplots, the whisker's outside the boxes correspond to the smallest and largest points within the $1.5$-inner quartile range (calculated on the original scale). Outliers are highlighted in red. The ordinate axis is log10-transformed.
  • Figure 3: Boxplots for numerical experiments from Section \ref{['sec_numerical_experiments']}. Remark: In these boxplots, the whisker's outside the boxes correspond to the smallest and largest points within the $1.5$-inner quartile range (calculated on the original scale). Outliers are highlighted in red. The ordinate axis is log10-transformed. The number of points with a median estimation error strictly larger than $10^0$ is (stated from $10^2$ to $10^6$): $151$, $81$, $62$, $44$, $23$.
  • Figure 4: Full time graph for Example \ref{['ex_app_1']}. Here, the red edges correspond to $A^{(5)}_{YO^2}$ and the blue edges to $A^{(1)}_{YY}$.
  • Figure 5: Example full time graph. Here, the red edges correspond to $A^{(3)}_{YX}$, the blue edges to $A^{(1)}_{YY}$ and the orange edges to $A^{(3)}_{YY}$.
  • ...and 2 more figures

Theorems & Definitions (49)

  • Example 1
  • Definition 4: Treks
  • Example 2
  • Remark 5
  • Definition 6: Walk/Path & Trek monomial
  • Example 2: continued
  • Definition 7: System of treks/directed paths
  • Theorem 8
  • Example 3
  • Proposition 9: Consistent Estimation
  • ...and 39 more