Table of Contents
Fetching ...

Valid Inference After Causal Discovery

Paula Gradu, Tijana Zrnic, Yixin Wang, Michael I. Jordan

TL;DR

This paper tackles the problem of obtaining valid statistical inferences for causal effects after causal discovery, addressing Freedman’s paradox caused by double dipping. It develops a randomized, differential-privacy–inspired framework that uses max-information bounds to correct downstream confidence intervals, ensuring coverage even when the causal graph is learned from the data. The main contributions include noisy versions of score-based search (noisy-select) and greedy search (noisy-ges) with finite-sample validity guarantees, theoretical connections to adaptive data analysis, and extensive empirical studies showing improved validity and competitive graph quality compared to data splitting. The approach is applicable to observational and interventional settings, scalable to high-dimensional problems, and robust to misspecification, offering a principled path for reliable causal inferences in practice.

Abstract

Causal discovery and causal effect estimation are two fundamental tasks in causal inference. While many methods have been developed for each task individually, statistical challenges arise when applying these methods jointly: estimating causal effects after running causal discovery algorithms on the same data leads to "double dipping," invalidating the coverage guarantees of classical confidence intervals. To this end, we develop tools for valid post-causal-discovery inference. Across empirical studies, we show that a naive combination of causal discovery and subsequent inference algorithms leads to highly inflated miscoverage rates; on the other hand, applying our method provides reliable coverage while achieving more accurate causal discovery than data splitting.

Valid Inference After Causal Discovery

TL;DR

This paper tackles the problem of obtaining valid statistical inferences for causal effects after causal discovery, addressing Freedman’s paradox caused by double dipping. It develops a randomized, differential-privacy–inspired framework that uses max-information bounds to correct downstream confidence intervals, ensuring coverage even when the causal graph is learned from the data. The main contributions include noisy versions of score-based search (noisy-select) and greedy search (noisy-ges) with finite-sample validity guarantees, theoretical connections to adaptive data analysis, and extensive empirical studies showing improved validity and competitive graph quality compared to data splitting. The approach is applicable to observational and interventional settings, scalable to high-dimensional problems, and robust to misspecification, offering a principled path for reliable causal inferences in practice.

Abstract

Causal discovery and causal effect estimation are two fundamental tasks in causal inference. While many methods have been developed for each task individually, statistical challenges arise when applying these methods jointly: estimating causal effects after running causal discovery algorithms on the same data leads to "double dipping," invalidating the coverage guarantees of classical confidence intervals. To this end, we develop tools for valid post-causal-discovery inference. Across empirical studies, we show that a naive combination of causal discovery and subsequent inference algorithms leads to highly inflated miscoverage rates; on the other hand, applying our method provides reliable coverage while achieving more accurate causal discovery than data splitting.
Paper Structure (39 sections, 12 theorems, 36 equations, 11 figures, 7 algorithms)

This paper contains 39 sections, 12 theorems, 36 equations, 11 figures, 7 algorithms.

Key Result

Proposition 1

Suppose that algorithm $\mathcal{A}$ is $\epsilon$-differentially private, and fix any $\gamma\in(0,1)$. Then, we have $I^\gamma_\infty(\mathcal{A}(\mathcal{D});\mathcal{D}) \leq \frac{n}{2}\epsilon^2 + \epsilon \sqrt{n\log(2/\gamma)/2}$.

Figures (11)

  • Figure 1: Probability of error for varying $n$ and $d$ in empty graph for exact selection (left), noisy-select with $\epsilon=0.02$ (middle), and $\epsilon=0.04$ (right).
  • Figure 2: Probability of error for varying $n$ and $d$ in empty graph for classical GES (left), noisy-ges with $\epsilon=0.02$ (middle), and $\epsilon=0.04$ (right).
  • Figure 3: Probability of error for varying $n$ and $d$ in random graph for exact selection (left), noisy-select with $\epsilon=0.02$ (middle), and $\epsilon=0.04$ (right).
  • Figure 4: Probability of error for varying $n$ and $d$ in random graph for classical GES (left) and noisy-ges with $\epsilon=0.02$ (middle), and $\epsilon=0.04$ (right).
  • Figure 5: Comparison of noisy-select with varying $\epsilon$ and three data splitting baselines in terms of SHD (left) and interval widths (right) for a random graph.
  • ...and 6 more figures

Theorems & Definitions (25)

  • Definition 1: Max-information dwork2015generalization
  • Definition 2: Differential privacy dwork2006calibrating
  • Proposition 1: dwork2015generalization
  • Definition 3: Score sensitivity
  • Lemma 1
  • Theorem 1
  • Proposition 2
  • Definition 4: Decomposability
  • Definition 5: Local score sensitivity
  • Lemma 2
  • ...and 15 more