Table of Contents
Fetching ...

Your Assumed DAG is Wrong and Here's How To Deal With It

Kirtan Padh, Zhufeng Li, Cecilia Casolo, Niki Kilbertus

TL;DR

This work tackles causal effect estimation under DAG uncertainty by introducing a gradient-based optimization framework that bounds a target causal query $\mathcal{Q}_{\mathcal{G}}$ across all DAGs compatible with partial prior knowledge. It parameterizes uncertain edges with a matrix $A_{\alpha}$ and searches a continuous space $\alpha$ to obtain tight lower and upper bounds, applicable to nonlinear SCMs and large graphs using observational data $\mathcal{D}$. The approach generalizes beyond MEC limitations and single-graph outputs, delivering informative bounds without exhaustive enumeration. Empirical results on synthetic data and a real-world constraint-based discovery task demonstrate that the bounds are both informative and sharp, providing a robust rebuttal to the critique of relying on an incorrect assumed DAG for causal inference.

Abstract

Assuming a directed acyclic graph (DAG) that represents prior knowledge of causal relationships between variables is a common starting point for cause-effect estimation. Existing literature typically invokes hypothetical domain expert knowledge or causal discovery algorithms to justify this assumption. In practice, neither may propose a single DAG with high confidence. Domain experts are hesitant to rule out dependencies with certainty or have ongoing disputes about relationships; causal discovery often relies on untestable assumptions itself or only provides an equivalence class of DAGs and is commonly sensitive to hyperparameter and threshold choices. We propose an efficient, gradient-based optimization method that provides bounds for causal queries over a collection of causal graphs -- compatible with imperfect prior knowledge -- that may still be too large for exhaustive enumeration. Our bounds achieve good coverage and sharpness for causal queries such as average treatment effects in linear and non-linear synthetic settings as well as on real-world data. Our approach aims at providing an easy-to-use and widely applicable rebuttal to the valid critique of `What if your assumed DAG is wrong?'.

Your Assumed DAG is Wrong and Here's How To Deal With It

TL;DR

This work tackles causal effect estimation under DAG uncertainty by introducing a gradient-based optimization framework that bounds a target causal query across all DAGs compatible with partial prior knowledge. It parameterizes uncertain edges with a matrix and searches a continuous space to obtain tight lower and upper bounds, applicable to nonlinear SCMs and large graphs using observational data . The approach generalizes beyond MEC limitations and single-graph outputs, delivering informative bounds without exhaustive enumeration. Empirical results on synthetic data and a real-world constraint-based discovery task demonstrate that the bounds are both informative and sharp, providing a robust rebuttal to the critique of relying on an incorrect assumed DAG for causal inference.

Abstract

Assuming a directed acyclic graph (DAG) that represents prior knowledge of causal relationships between variables is a common starting point for cause-effect estimation. Existing literature typically invokes hypothetical domain expert knowledge or causal discovery algorithms to justify this assumption. In practice, neither may propose a single DAG with high confidence. Domain experts are hesitant to rule out dependencies with certainty or have ongoing disputes about relationships; causal discovery often relies on untestable assumptions itself or only provides an equivalence class of DAGs and is commonly sensitive to hyperparameter and threshold choices. We propose an efficient, gradient-based optimization method that provides bounds for causal queries over a collection of causal graphs -- compatible with imperfect prior knowledge -- that may still be too large for exhaustive enumeration. Our bounds achieve good coverage and sharpness for causal queries such as average treatment effects in linear and non-linear synthetic settings as well as on real-world data. Our approach aims at providing an easy-to-use and widely applicable rebuttal to the valid critique of `What if your assumed DAG is wrong?'.

Paper Structure

This paper contains 6 sections, 2 equations, 1 figure.

Figures (1)

  • Figure 1: (a) An example illustrating a graph with partial information. Red edges represents forbidden edges and blue dotted edges represent unknown edges. (b)–(f) All plausible DAGs compatible with the information provided in (a).