Table of Contents
Fetching ...

Practically Effective Adjustment Variable Selection in Causal Inference

Atsushi Noda, Takashi Isozaki

TL;DR

This work tackles the problem of selecting causal adjustment variables for estimating $P(Y|do(X=x))$ when multiple back-door-compliant sets exist and data are limited. It introduces CAVS, a two-step, correlation-informed algorithm that first enumerates minimal back-door sets and then selects the one with the smallest mutual information with the intervention variable $X$, thereby reducing data-sparsity bias. The authors extend the approach to CPDAGs using the Generalized Adjustment Criterion and amenability, proving that, under certain edge-orientation conditions, there exists a common valid adjustment set across Markov-equivalent DAGs, ensuring computable interventions in CPDAGs. Empirical results on established datasets and artificial graphs demonstrate that CAVS yields lower estimation error and greater robustness to small sample sizes compared with baselines such as using $pa(X)$ or minimally sufficient back-door sets. Overall, the method improves practical causal effect estimation in settings with uncertain graphs and limited data, expanding applicability to real-world scenarios.

Abstract

In the estimation of causal effects, one common method for removing the influence of confounders is to adjust the variables that satisfy the back-door criterion. However, it is not always possible to uniquely determine sets of such variables. Moreover, real-world data is almost always limited, which means it may be insufficient for statistical estimation. Therefore, we propose criteria for selecting variables from a list of candidate adjustment variables along with an algorithm to prevent accuracy degradation in causal effect estimation. We initially focus on directed acyclic graphs (DAGs) and then outlines specific steps for applying this method to completed partially directed acyclic graphs (CPDAGs). We also present and prove a theorem on causal effect computation possibility in CPDAGs. Finally, we demonstrate the practical utility of our method using both existing and artificial data.

Practically Effective Adjustment Variable Selection in Causal Inference

TL;DR

This work tackles the problem of selecting causal adjustment variables for estimating when multiple back-door-compliant sets exist and data are limited. It introduces CAVS, a two-step, correlation-informed algorithm that first enumerates minimal back-door sets and then selects the one with the smallest mutual information with the intervention variable , thereby reducing data-sparsity bias. The authors extend the approach to CPDAGs using the Generalized Adjustment Criterion and amenability, proving that, under certain edge-orientation conditions, there exists a common valid adjustment set across Markov-equivalent DAGs, ensuring computable interventions in CPDAGs. Empirical results on established datasets and artificial graphs demonstrate that CAVS yields lower estimation error and greater robustness to small sample sizes compared with baselines such as using or minimally sufficient back-door sets. Overall, the method improves practical causal effect estimation in settings with uncertain graphs and limited data, expanding applicability to real-world scenarios.

Abstract

In the estimation of causal effects, one common method for removing the influence of confounders is to adjust the variables that satisfy the back-door criterion. However, it is not always possible to uniquely determine sets of such variables. Moreover, real-world data is almost always limited, which means it may be insufficient for statistical estimation. Therefore, we propose criteria for selecting variables from a list of candidate adjustment variables along with an algorithm to prevent accuracy degradation in causal effect estimation. We initially focus on directed acyclic graphs (DAGs) and then outlines specific steps for applying this method to completed partially directed acyclic graphs (CPDAGs). We also present and prove a theorem on causal effect computation possibility in CPDAGs. Finally, we demonstrate the practical utility of our method using both existing and artificial data.

Paper Structure

This paper contains 13 sections, 4 theorems, 5 equations, 8 figures, 1 table, 1 algorithm.

Key Result

Theorem 1

$Z$ is an adjustment set relative to $\{X,Y\}$ in a DAG or CPDAG $G$ if and only if $Z$ satisfies the generalized adjustment criterion relative to $\{X,Y\}$ in $G$.

Figures (8)

  • Figure 1: (Left) There is a strong correlation between $X$ and $P1$, indicated by the bold arrow. $P1$ definitely closes the back-door path, but so do $V1$ and $V2$. (Right) $X$ has three parents $\{P1, P2, P3\}$, but only $V1$ is sufficient to close the back-door path.
  • Figure 2: Specific DAG and CPT examples that reduce accuracy. In the conditional probability table for $Y$, the numbers in parentheses indicate the number of data samples, and the numbers above them indicate the probability value of $Y$ obtained by aggregating the data. In this example, the computational accuracy of $do(X=0)$ calculus will be unstable due to the small number of data samples available for the cases $X=0$ and $Z=0$. It indicates that it is important to examine the data as well as the DAG structure.
  • Figure 3: An example explaining each step of CAVS algorithm. (a) shows the DAG, (b) shows the graph after Step 1-1 and (c) shows the graph after Step 1-2.
  • Figure 4: (a) An example of CPDAG. (b) (c) (d) Markov-equivalent DAGs, but with different adjustment variables for each graph. (b) and (c) have the same direction of edges adjacent to $X$ and the same adjustment variables $\{Z1, Z2\}$. Otherwise, the direction of the edges adjacent to $X$ in (d) is different from (b) and (c), and the adjustment variable is empty in (d). In other words, the CPDAG in (a) can be divided into {(b), (c)} and {(d)}, depending on the direction of the edges adjacent to X, and the adjustment variables can be determined.
  • Figure 5: DAG structure of Insurance data. The intervention variable Accident locates one level above of the outcome variable OtherCarCost. And it can be seen that Accident has three parent variables {Mileage, Antilock, DrivQuality}.
  • ...and 3 more figures

Theorems & Definitions (9)

  • Definition 1: Back-door criterion, Pearl, 2009
  • Definition 2: Adjustment Criterion (AC), Shpitser et al., 2010
  • Definition 3: Amenability for DAGs and CPDAGs, Perković et al., 2015
  • Definition 4: Generalized Adjustment Criterion (GAC), Perković et al., 2015
  • Theorem 1: Perković et al., 2015
  • Lemma 1: Perković et al., 2015
  • Lemma 2: Perković et al., 2015
  • Theorem 2
  • proof