Your Assumed DAG is Wrong and Here's How To Deal With It
Kirtan Padh, Zhufeng Li, Cecilia Casolo, Niki Kilbertus
TL;DR
This work tackles causal effect estimation under DAG uncertainty by introducing a gradient-based optimization framework that bounds a target causal query $\mathcal{Q}_{\mathcal{G}}$ across all DAGs compatible with partial prior knowledge. It parameterizes uncertain edges with a matrix $A_{\alpha}$ and searches a continuous space $\alpha$ to obtain tight lower and upper bounds, applicable to nonlinear SCMs and large graphs using observational data $\mathcal{D}$. The approach generalizes beyond MEC limitations and single-graph outputs, delivering informative bounds without exhaustive enumeration. Empirical results on synthetic data and a real-world constraint-based discovery task demonstrate that the bounds are both informative and sharp, providing a robust rebuttal to the critique of relying on an incorrect assumed DAG for causal inference.
Abstract
Assuming a directed acyclic graph (DAG) that represents prior knowledge of causal relationships between variables is a common starting point for cause-effect estimation. Existing literature typically invokes hypothetical domain expert knowledge or causal discovery algorithms to justify this assumption. In practice, neither may propose a single DAG with high confidence. Domain experts are hesitant to rule out dependencies with certainty or have ongoing disputes about relationships; causal discovery often relies on untestable assumptions itself or only provides an equivalence class of DAGs and is commonly sensitive to hyperparameter and threshold choices. We propose an efficient, gradient-based optimization method that provides bounds for causal queries over a collection of causal graphs -- compatible with imperfect prior knowledge -- that may still be too large for exhaustive enumeration. Our bounds achieve good coverage and sharpness for causal queries such as average treatment effects in linear and non-linear synthetic settings as well as on real-world data. Our approach aims at providing an easy-to-use and widely applicable rebuttal to the valid critique of `What if your assumed DAG is wrong?'.
