Table of Contents
Fetching ...

Local Discovery by Partitioning: Polynomial-Time Causal Discovery Around Exposure-Outcome Pairs

Jacqueline Maasch, Weishen Pan, Shantanu Gupta, Volodymyr Kuleshov, Kyra Gan, Fei Wang

TL;DR

Local Discovery by Partitioning (LDP) tackles global causal discovery’s intractability in observational data by organizing covariates into eight universal causal partitions relative to an exposure–outcome pair $(X,Y)$ and identifying a valid adjustment set (VAS) under the backdoor criterion in polynomial time. It operates without pretreatment or parametric data-generating assumptions, using constraint-based tests with worst-case $O(|oldsymbol{Z}|^2)$ independence checks, and provides intermediate partition labels to guide inference. The approach yields dramatic runtime gains (e.g., $1400$–$2500\times$ faster than PC on benchmarks) and improved downstream ATE estimation precision compared with baselines, while remaining flexible to multiple adjustment criteria such as common cause, disjunctive cause, and outcome criteria. The paper also presents identifiability results showing VAS validity under latent confounding when the Z5 criterion is satisfied, supported by extensive experiments on synthetic graphs and a real-world Mildew benchmark. Overall, LDP offers a practical, locally focused alternative to global causal discovery for unbiased causal inference in complex, possibly latent, settings.

Abstract

Causal discovery is crucial for causal inference in observational studies, as it can enable the identification of valid adjustment sets (VAS) for unbiased effect estimation. However, global causal discovery is notoriously hard in the nonparametric setting, with exponential time and sample complexity in the worst case. To address this, we propose local discovery by partitioning (LDP): a local causal discovery method that is tailored for downstream inference tasks without requiring parametric and pretreatment assumptions. LDP is a constraint-based procedure that returns a VAS for an exposure-outcome pair under latent confounding, given sufficient conditions. The total number of independence tests performed is worst-case quadratic with respect to the cardinality of the variable set. Asymptotic theoretical guarantees are numerically validated on synthetic graphs. Adjustment sets from LDP yield less biased and more precise average treatment effect estimates than baseline discovery algorithms, with LDP outperforming on confounder recall, runtime, and test count for VAS discovery. Notably, LDP ran at least 1300x faster than baselines on a benchmark.

Local Discovery by Partitioning: Polynomial-Time Causal Discovery Around Exposure-Outcome Pairs

TL;DR

Local Discovery by Partitioning (LDP) tackles global causal discovery’s intractability in observational data by organizing covariates into eight universal causal partitions relative to an exposure–outcome pair and identifying a valid adjustment set (VAS) under the backdoor criterion in polynomial time. It operates without pretreatment or parametric data-generating assumptions, using constraint-based tests with worst-case independence checks, and provides intermediate partition labels to guide inference. The approach yields dramatic runtime gains (e.g., faster than PC on benchmarks) and improved downstream ATE estimation precision compared with baselines, while remaining flexible to multiple adjustment criteria such as common cause, disjunctive cause, and outcome criteria. The paper also presents identifiability results showing VAS validity under latent confounding when the Z5 criterion is satisfied, supported by extensive experiments on synthetic graphs and a real-world Mildew benchmark. Overall, LDP offers a practical, locally focused alternative to global causal discovery for unbiased causal inference in complex, possibly latent, settings.

Abstract

Causal discovery is crucial for causal inference in observational studies, as it can enable the identification of valid adjustment sets (VAS) for unbiased effect estimation. However, global causal discovery is notoriously hard in the nonparametric setting, with exponential time and sample complexity in the worst case. To address this, we propose local discovery by partitioning (LDP): a local causal discovery method that is tailored for downstream inference tasks without requiring parametric and pretreatment assumptions. LDP is a constraint-based procedure that returns a VAS for an exposure-outcome pair under latent confounding, given sufficient conditions. The total number of independence tests performed is worst-case quadratic with respect to the cardinality of the variable set. Asymptotic theoretical guarantees are numerically validated on synthetic graphs. Adjustment sets from LDP yield less biased and more precise average treatment effect estimates than baseline discovery algorithms, with LDP outperforming on confounder recall, runtime, and test count for VAS discovery. Notably, LDP ran at least 1300x faster than baselines on a benchmark.
Paper Structure (49 sections, 20 theorems, 2 equations, 21 figures, 14 tables, 1 algorithm)

This paper contains 49 sections, 20 theorems, 2 equations, 21 figures, 14 tables, 1 algorithm.

Key Result

Theorem 3.1

The eight partitions defined in Table tab:partitions are exhaustive and mutually exclusive, such that any variable $Z$ falls uniquely under one partition category.

Figures (21)

  • Figure 1: For sample sizes $n \leq 10\mathsf{k}$, classic constraint-based algorithm PC fails to causally partition the data with respect to $\{X,Y\}$ (e.g., by misidentifying confounder $Z_1$ and instrument $Z_5$). Data generating process is linear-Gaussian (Fisher-z tests; $\alpha = 0.005$). See Figure \ref{['fig:motive_times_tests']} for details.
  • Figure 2: All potential acyclic triples that can be induced by $X$, $Y$, and a single $Z$ when paths are restricted to a length of 1.
  • Figure 3: Given $X$ and $Y$, we can project any ground truth DAG onto a reduced 10-node DAG where nodes represent partition sets (which may be empty), arrows signify both adjacencies and indirect active paths (one or more), and inter-partition relations are abstracted away. The dashed edge suggests a possible null relation. Conditioning on $\mathbf{Z}_1$ blocks all backdoor paths for $\{X,Y\}$.
  • Figure 4: Each step of Algorithm \ref{['alg:method']} reveals additional information about the partitions of $\mathbf{Z}$ without requiring LDP to learn the full causal graph. Nodes that are fully colored are fully discovered, partial coloring denotes partial knowledge, and no coloring denotes no knowledge.
  • Figure 5: Total tests performed under an independence oracle (top) and mean runtime over 100 replicates (bottom) as the cardinality of $\mathbf{Z}$ increases, with 95% confidence intervals in shaded regions. Each DAG resembles Figure \ref{['fig:ten_node_dag']} with equal cardinality per partition ($[1,10]$). Results are reported for LDP and PC. LDECC and MB-by-MB curves overlapped with PC, with PC outperforming. Exponential, quadratic, $x \log_2(x)$, and linear curves (in tests and milliseconds) serve as comparison. Table \ref{['tab:time_tests']} reports raw data.
  • ...and 16 more figures

Theorems & Definitions (43)

  • Definition 2.1: $D$-separation, spirtes_causation_2000
  • Definition 2.2: Active paths, spirtes_causation_2000
  • Definition 2.3: Backdoor path, pearl_causal_2009
  • Definition 2.4: Valid adjustment under the backdoor criterion, pearl_causal_2009
  • Definition 2.5: Confounder, vanderweele_definition_2013
  • Theorem 3.1
  • proof
  • Definition 3.2: Inter-partition active path
  • Remark 4.1: LDP is foremost a VAS discovery method, not a partition labeling method
  • Lemma 4.2
  • ...and 33 more