Table of Contents
Fetching ...

Automating the Selection of Proxy Variables of Unmeasured Confounders

Feng Xie, Zhengming Chen, Shanshan Luo, Wang Miao, Ruichu Cai, Zhi Geng

TL;DR

This paper first extends the existing proxy variable estimator, originally addressing a single unmeasured confounder, to accommodate scenarios where multiple unmeasured confounders exist between the treatments and the outcome, and presents two different sets of precise identifiability conditions for selecting valid proxy variables of unmeasured confounders.

Abstract

Recently, interest has grown in the use of proxy variables of unobserved confounding for inferring the causal effect in the presence of unmeasured confounders from observational data. One difficulty inhibiting the practical use is finding valid proxy variables of unobserved confounding to a target causal effect of interest. These proxy variables are typically justified by background knowledge. In this paper, we investigate the estimation of causal effects among multiple treatments and a single outcome, all of which are affected by unmeasured confounders, within a linear causal model, without prior knowledge of the validity of proxy variables. To be more specific, we first extend the existing proxy variable estimator, originally addressing a single unmeasured confounder, to accommodate scenarios where multiple unmeasured confounders exist between the treatments and the outcome. Subsequently, we present two different sets of precise identifiability conditions for selecting valid proxy variables of unmeasured confounders, based on the second-order statistics and higher-order statistics of the data, respectively. Moreover, we propose two data-driven methods for the selection of proxy variables and for the unbiased estimation of causal effects. Theoretical analysis demonstrates the correctness of our proposed algorithms. Experimental results on both synthetic and real-world data show the effectiveness of the proposed approach.

Automating the Selection of Proxy Variables of Unmeasured Confounders

TL;DR

This paper first extends the existing proxy variable estimator, originally addressing a single unmeasured confounder, to accommodate scenarios where multiple unmeasured confounders exist between the treatments and the outcome, and presents two different sets of precise identifiability conditions for selecting valid proxy variables of unmeasured confounders.

Abstract

Recently, interest has grown in the use of proxy variables of unobserved confounding for inferring the causal effect in the presence of unmeasured confounders from observational data. One difficulty inhibiting the practical use is finding valid proxy variables of unobserved confounding to a target causal effect of interest. These proxy variables are typically justified by background knowledge. In this paper, we investigate the estimation of causal effects among multiple treatments and a single outcome, all of which are affected by unmeasured confounders, within a linear causal model, without prior knowledge of the validity of proxy variables. To be more specific, we first extend the existing proxy variable estimator, originally addressing a single unmeasured confounder, to accommodate scenarios where multiple unmeasured confounders exist between the treatments and the outcome. Subsequently, we present two different sets of precise identifiability conditions for selecting valid proxy variables of unmeasured confounders, based on the second-order statistics and higher-order statistics of the data, respectively. Moreover, we propose two data-driven methods for the selection of proxy variables and for the unbiased estimation of causal effects. Theoretical analysis demonstrates the correctness of our proposed algorithms. Experimental results on both synthetic and real-world data show the effectiveness of the proposed approach.
Paper Structure (37 sections, 16 theorems, 56 equations, 15 figures, 2 algorithms)

This paper contains 37 sections, 16 theorems, 56 equations, 15 figures, 2 algorithms.

Key Result

Proposition 1

Assume the system is a linear causal model, i.e., all variables are continuous and the causal relationships among variables are linear. Further, assume that there exist one unmeasured confounder $U$ that affects both treatment $X_k$ and outcome $Y$, and that $Z$ and $W$ are NCE and NCO of confounder where $\sigma_{X_kY}$ is the covariance between $X_k$ and $Y$, etc.

Figures (15)

  • Figure 1: A typical confounder proxy causal diagram. $Z$ and $W$ are NCE and NCO of unmeasured confounder $U$ for the causal relationship $X_k \to Y$.
  • Figure 2: Diagram of one possible violation of NCE and NCO assumptions. Dashed lines represent active paths. The symbol "✘" indicates that the current active paths should not exist here.
  • Figure 3: A simple causal graph involving 6 potential treatments and one outcome.
  • Figure 4: A linear causal model with any of the graphical structures above entails all possible rank constraints in the marginal covariance matrix of $\{X_k, Y, Z, W\}$.
  • Figure 5: Performance of NAIVE, FindNC, Proxy-Rank, and Proxy-GIN on the Gaussian case.
  • ...and 10 more figures

Theorems & Definitions (49)

  • Definition 1: NCE and NCO miao2018proxyshi2020selective
  • Definition 2: Connected (Disconnected) NCE and NCO
  • Definition 3: Quadruple-disconnected NC
  • Example 1
  • Proposition 1: Proxy Variables Estimator kuroki2014measurement
  • Remark 1
  • Proposition 2: Extended Proxy Variables Estimator
  • Remark 2
  • Remark 3
  • Definition 4: Rank Constraint
  • ...and 39 more