Practically Effective Adjustment Variable Selection in Causal Inference
Atsushi Noda, Takashi Isozaki
TL;DR
This work tackles the problem of selecting causal adjustment variables for estimating $P(Y|do(X=x))$ when multiple back-door-compliant sets exist and data are limited. It introduces CAVS, a two-step, correlation-informed algorithm that first enumerates minimal back-door sets and then selects the one with the smallest mutual information with the intervention variable $X$, thereby reducing data-sparsity bias. The authors extend the approach to CPDAGs using the Generalized Adjustment Criterion and amenability, proving that, under certain edge-orientation conditions, there exists a common valid adjustment set across Markov-equivalent DAGs, ensuring computable interventions in CPDAGs. Empirical results on established datasets and artificial graphs demonstrate that CAVS yields lower estimation error and greater robustness to small sample sizes compared with baselines such as using $pa(X)$ or minimally sufficient back-door sets. Overall, the method improves practical causal effect estimation in settings with uncertain graphs and limited data, expanding applicability to real-world scenarios.
Abstract
In the estimation of causal effects, one common method for removing the influence of confounders is to adjust the variables that satisfy the back-door criterion. However, it is not always possible to uniquely determine sets of such variables. Moreover, real-world data is almost always limited, which means it may be insufficient for statistical estimation. Therefore, we propose criteria for selecting variables from a list of candidate adjustment variables along with an algorithm to prevent accuracy degradation in causal effect estimation. We initially focus on directed acyclic graphs (DAGs) and then outlines specific steps for applying this method to completed partially directed acyclic graphs (CPDAGs). We also present and prove a theorem on causal effect computation possibility in CPDAGs. Finally, we demonstrate the practical utility of our method using both existing and artificial data.
