Table of Contents
Fetching ...

Local Learning for Covariate Selection in Nonparametric Causal Effect Estimation with Latent Variables

Zheng Li, Xichen Guo, Feng Xie, Yan Zeng, Hao Zhang, Zhi Geng

TL;DR

This work tackles causal effect estimation from observational data in the presence of latent confounding by introducing a fully local covariate-selection framework. It develops a Markov-blanket–based theory for identifying valid adjustment sets that depend only on testable independencies among observed variables, even when latent variables exist. The Local Search Adjustment Sets (LSAS) algorithm learns MBs and applies two regression criteria to determine whether a causal effect exists and to estimate it, or to report inconclusiveness, with formal guarantees of soundness and completeness under oracle tests. Empirically, LSAS outperforms global-graph–based and other local methods on synthetic benchmarks and a real birth-weight dataset, achieving higher accuracy while using far fewer independence tests, which highlights its practical impact for scalable causal inference with latent confounding.

Abstract

Estimating causal effects from nonexperimental data is a fundamental problem in many fields of science. A key component of this task is selecting an appropriate set of covariates for confounding adjustment to avoid bias. Most existing methods for covariate selection often assume the absence of latent variables and rely on learning the global network structure among variables. However, identifying the global structure can be unnecessary and inefficient, especially when our primary interest lies in estimating the effect of a treatment variable on an outcome variable. To address this limitation, we propose a novel local learning approach for covariate selection in nonparametric causal effect estimation, which accounts for the presence of latent variables. Our approach leverages testable independence and dependence relationships among observed variables to identify a valid adjustment set for a target causal relationship, ensuring both soundness and completeness under standard assumptions. We validate the effectiveness of our algorithm through extensive experiments on both synthetic and real-world data.

Local Learning for Covariate Selection in Nonparametric Causal Effect Estimation with Latent Variables

TL;DR

This work tackles causal effect estimation from observational data in the presence of latent confounding by introducing a fully local covariate-selection framework. It develops a Markov-blanket–based theory for identifying valid adjustment sets that depend only on testable independencies among observed variables, even when latent variables exist. The Local Search Adjustment Sets (LSAS) algorithm learns MBs and applies two regression criteria to determine whether a causal effect exists and to estimate it, or to report inconclusiveness, with formal guarantees of soundness and completeness under oracle tests. Empirically, LSAS outperforms global-graph–based and other local methods on synthetic benchmarks and a real birth-weight dataset, achieving higher accuracy while using far fewer independence tests, which highlights its practical impact for scalable causal inference with latent confounding.

Abstract

Estimating causal effects from nonexperimental data is a fundamental problem in many fields of science. A key component of this task is selecting an appropriate set of covariates for confounding adjustment to avoid bias. Most existing methods for covariate selection often assume the absence of latent variables and rely on learning the global network structure among variables. However, identifying the global structure can be unnecessary and inefficient, especially when our primary interest lies in estimating the effect of a treatment variable on an outcome variable. To address this limitation, we propose a novel local learning approach for covariate selection in nonparametric causal effect estimation, which accounts for the presence of latent variables. Our approach leverages testable independence and dependence relationships among observed variables to identify a valid adjustment set for a target causal relationship, ensuring both soundness and completeness under standard assumptions. We validate the effectiveness of our algorithm through extensive experiments on both synthetic and real-world data.

Paper Structure

This paper contains 36 sections, 6 theorems, 2 equations, 14 figures, 3 tables, 2 algorithms.

Key Result

Theorem 1

Let $\mathcal{D}$ be an observational dataset containing an ordered variable pair $(X, Y)$ and a set of covariates $\mathbf{O}$. There exists a subset of $\mathbf{O}$ is an adjustment set w.r.t. $(X, Y)$ if and only if there exists a subset of $\mathit{MB(Y)\setminus \{X\}}$ is an adjustment set

Figures (14)

  • Figure 1: Example MAGs with treatment $X$ and outcome $Y$. Nodes shaded in green represent a valid adjustment set. (a) Both global search EHS and local search CEELS methods identify the adjustment set. (b) Adapted from cheng2022local, CEELS fails to select the adjustment set despite the presence of a COSO variable $V_1$ (See Fig. 4 in cheng2022local). (c) An example without a COSO variable, where the adjustment set can still be found locally.
  • Figure 2: The illustrative example for MB in a MAG, where $Y$ is the target of interest and the green nodes belong to $\mathit{MB(Y)}$.
  • Figure 3: (a) An underlying causal DAG (adapted from haggstrom2018data), in which $U_1$ and $U_2$ are unobserved variables. (b) The corresponding MAG of the DAG shown in (a).
  • Figure 4: (a) An causal DAG, where $U_i,i=1,...,4$ are latent variables. (b) The corresponding MAG of the DAG in (a).
  • Figure 5: Performance of five algorithms on Specific Structures, MILDEW and WIN95PTS.
  • ...and 9 more figures

Theorems & Definitions (33)

  • Definition 1: Amenability van2014constructingperkovi2018complete
  • Definition 2: Forbidden set; $\mathit{Forb(X, Y)}$, perkovi2018complete
  • Definition 3: Generalized adjustment criterion perkovi2018complete
  • Example 1: Generalized adjustment criterion
  • Remark 1
  • Remark 2
  • Definition 4: Adjustment set in Markov blanket
  • Remark 3
  • Theorem 1: Existence of $\mathit{\mathcal{A}_{MB}(X, Y)}$
  • Example 2
  • ...and 23 more