Table of Contents
Fetching ...

Linear-Time Algorithms for Front-Door Adjustment in Causal Graphs

Marcel Wienöbst, Benito van der Zander, Maciej Liśkiewicz

TL;DR

Problem: Identify and estimate the total causal effect $P({\bf y} | do({\bf x}))$ when unobserved confounding precludes covariate adjustment, by using front-door sets in a DAG. Approach: develop linear-time algorithms that (i) find a front-door adjustment set ${\bf Z}$ in $O(n+m)$, (ii) enumerate all front-door sets with $O(n(n+m))$ delay, and (iii) compute an inclusion-minimal front-door set in $O(n+m)$ time, exploiting Bayes-Ball for $d$-separation and a linear-time forbidden-vertex propagation. Contributions: first linear-time FD set finder, $O(n(n+m))$-delay enumeration, and a linear-time minimal-FD-set finder, with multi-language implementations and large-scale empirical validation. Significance: speeds up front-door identifications to be comparable with back-door methods, enabling practical causal effect estimation in large DAGs; future work includes finding true minimum-size FD sets efficiently and handling causal discovery over Markov-equivalence classes.

Abstract

Causal effect estimation from observational data is a fundamental task in empirical sciences. It becomes particularly challenging when unobserved confounders are involved in a system. This paper focuses on front-door adjustment -- a classic technique which, using observed mediators allows to identify causal effects even in the presence of unobserved confounding. While the statistical properties of the front-door estimation are quite well understood, its algorithmic aspects remained unexplored for a long time. In 2022, Jeong, Tian, and Bareinboim presented the first polynomial-time algorithm for finding sets satisfying the front-door criterion in a given directed acyclic graph (DAG), with an $O(n^3(n+m))$ run time, where $n$ denotes the number of variables and $m$ the number of edges of the causal graph. In our work, we give the first linear-time, i.e., $O(n+m)$, algorithm for this task, which thus reaches the asymptotically optimal time complexity. This result implies an $O(n(n+m))$ delay enumeration algorithm of all front-door adjustment sets, again improving previous work by a factor of $n^3$. Moreover, we provide the first linear-time algorithm for finding a minimal front-door adjustment set. We offer implementations of our algorithms in multiple programming languages to facilitate practical usage and empirically validate their feasibility, even for large graphs.

Linear-Time Algorithms for Front-Door Adjustment in Causal Graphs

TL;DR

Problem: Identify and estimate the total causal effect when unobserved confounding precludes covariate adjustment, by using front-door sets in a DAG. Approach: develop linear-time algorithms that (i) find a front-door adjustment set in , (ii) enumerate all front-door sets with delay, and (iii) compute an inclusion-minimal front-door set in time, exploiting Bayes-Ball for -separation and a linear-time forbidden-vertex propagation. Contributions: first linear-time FD set finder, -delay enumeration, and a linear-time minimal-FD-set finder, with multi-language implementations and large-scale empirical validation. Significance: speeds up front-door identifications to be comparable with back-door methods, enabling practical causal effect estimation in large DAGs; future work includes finding true minimum-size FD sets efficiently and handling causal discovery over Markov-equivalence classes.

Abstract

Causal effect estimation from observational data is a fundamental task in empirical sciences. It becomes particularly challenging when unobserved confounders are involved in a system. This paper focuses on front-door adjustment -- a classic technique which, using observed mediators allows to identify causal effects even in the presence of unobserved confounding. While the statistical properties of the front-door estimation are quite well understood, its algorithmic aspects remained unexplored for a long time. In 2022, Jeong, Tian, and Bareinboim presented the first polynomial-time algorithm for finding sets satisfying the front-door criterion in a given directed acyclic graph (DAG), with an run time, where denotes the number of variables and the number of edges of the causal graph. In our work, we give the first linear-time, i.e., , algorithm for this task, which thus reaches the asymptotically optimal time complexity. This result implies an delay enumeration algorithm of all front-door adjustment sets, again improving previous work by a factor of . Moreover, we provide the first linear-time algorithm for finding a minimal front-door adjustment set. We offer implementations of our algorithms in multiple programming languages to facilitate practical usage and empirically validate their feasibility, even for large graphs.
Paper Structure (11 sections, 12 theorems, 1 equation, 8 figures, 1 table, 8 algorithms)

This paper contains 11 sections, 12 theorems, 1 equation, 8 figures, 1 table, 8 algorithms.

Key Result

Lemma 1

It is possible to find ${\bf Z}_{(\mathrm{i})} \subseteq {\bf R}$, i.e., all vertices $Z$ in ${\bf R}$ with $(Z \mathbin{ {$⊥$} {$=$} {$$} } {\bf X})_{G_{\underline{{\bf X}}}}$, in time $O(n+m)$.

Figures (8)

  • Figure 1: Causal graphs, where $X$ is the treatment, $Y$ the outcome, and $U$ represents an unobserved confounder. Graph (i) is a canonical example, with the (unique) set $\{Z\}$ satisfying the front-door (FD) criterion relative to $(X,Y)$. For graph (ii), there exist 13 FD sets; Both the algorithm of jeong2022finding as well as our basic Algorithm \ref{['alg:finding']}, output ${\bf Z}=\{A,B,C,D\}$ of maximum size. In contrast, Algorithm \ref{['alg:finding-minimal']} computes minimal FD set $\{D\}$ of size 1. Graph (iii) illustrates the non-monotonicity of the FD criterion: while both $\{A,B,C\}$ and $\{A\}$ are FD sets, neither $\{A,B\}$ nor $\{A,C\}$ nor $\{B,C\}$ satisfy the FD criterion.
  • Figure 2: Running example for the algorithms for finding FD sets in $O(n+m)$ given in this section. Nodes in ${\bf Z}_{(\mathrm{i})}$ are marked green and nodes in ${\bf Z}_{(\mathrm{ii})}$ are marked blue.
  • Figure 3: Example graph for finding a minimal FD set.
  • Figure 4: Log-Log plot of the average run time in seconds for jeong2022finding (jtb), Algorithm \ref{['alg:finding']} (find) and Algorithm \ref{['alg:finding-minimal']} (min) on Erdős-Rényi graphs with $1.5n$ (left) and $5n$ (right) edges, corresponding to expected vertex degree $3$ and $10$. $|{\bf X}|, |{\bf Y}|$ are random integers between $1$ and $3$; $|{\bf I}| = 0$ and $|{\bf R}| = 0.5n$. For each choice of $n$, we average over 50 graphs.
  • Figure 5: Run time comparison between find (Algorithm \ref{['alg:finding']}), min (Algorithm \ref{['alg:finding-minimal']}) and jtb (jeong2022finding) for additional parameter choices. In particular, we choose $|{\bf E}| = 2.5n$ (corresponding to expected degree $5$) and vary the choices of $|{\bf R}|$ and $|{\bf X}|$, $|{\bf Y}|$. As can be seen the run time differences are minor.
  • ...and 3 more figures

Theorems & Definitions (24)

  • Lemma 1
  • proof
  • Definition 1
  • Lemma 2
  • proof
  • Theorem 1
  • Theorem 2
  • proof
  • Corollary 1
  • Definition 2
  • ...and 14 more