Table of Contents
Fetching ...

Causal Discovery via Conditional Independence Testing with Proxy Variables

Mingzhou Liu, Xinwei Sun, Yu Qiao, Yizhou Wang

TL;DR

This paper designs a proxy-based hypothesis test for identifying causal relationships when unobserved variables are present that has ideal power when large samples are available and demonstrates the effectiveness of the method using synthetic and real-world data.

Abstract

Distinguishing causal connections from correlations is important in many scenarios. However, the presence of unobserved variables, such as the latent confounder, can introduce bias in conditional independence testing commonly employed in constraint-based causal discovery for identifying causal relations. To address this issue, existing methods introduced proxy variables to adjust for the bias caused by unobserveness. However, these methods were either limited to categorical variables or relied on strong parametric assumptions for identification. In this paper, we propose a novel hypothesis-testing procedure that can effectively examine the existence of the causal relationship over continuous variables, without any parametric constraint. Our procedure is based on discretization, which under completeness conditions, is able to asymptotically establish a linear equation whose coefficient vector is identifiable under the causal null hypothesis. Based on this, we introduce our test statistic and demonstrate its asymptotic level and power. We validate the effectiveness of our procedure using both synthetic and real-world data.

Causal Discovery via Conditional Independence Testing with Proxy Variables

TL;DR

This paper designs a proxy-based hypothesis test for identifying causal relationships when unobserved variables are present that has ideal power when large samples are available and demonstrates the effectiveness of the method using synthetic and real-world data.

Abstract

Distinguishing causal connections from correlations is important in many scenarios. However, the presence of unobserved variables, such as the latent confounder, can introduce bias in conditional independence testing commonly employed in constraint-based causal discovery for identifying causal relations. To address this issue, existing methods introduced proxy variables to adjust for the bias caused by unobserveness. However, these methods were either limited to categorical variables or relied on strong parametric assumptions for identification. In this paper, we propose a novel hypothesis-testing procedure that can effectively examine the existence of the causal relationship over continuous variables, without any parametric constraint. Our procedure is based on discretization, which under completeness conditions, is able to asymptotically establish a linear equation whose coefficient vector is identifiable under the causal null hypothesis. Based on this, we introduce our test statistic and demonstrate its asymptotic level and power. We validate the effectiveness of our procedure using both synthetic and real-world data.
Paper Structure (27 sections, 7 theorems, 65 equations, 6 figures, 1 table, 1 algorithm)

This paper contains 27 sections, 7 theorems, 65 equations, 6 figures, 1 table, 1 algorithm.

Key Result

Proposition 4.4

Suppose Asm. asm.comp-main holds. Then, for any discretization $\tilde{W}$ of $W$, there exists a discretization $\tilde{X}$ of $X$ such that the matrix $P(\tilde{W}|\tilde{X})$ has full row rank. Similarly, there also exists a discretization $\tilde{U}$ of $U$ such that the matrix $P(\tilde{W}|\til

Figures (6)

  • Figure 1: Causal diagrams illustrating causal discovery with proxy variables. (a) and (b) respectively represent the cases where $U$ is a latent confounder and a latent mediator. Note that our procedure is not restricted to these diagrams, but can apply to any scenario satisfying $X\perp \!\!\!\!\perp W|U$kuroki2014measurement.
  • Figure 2: Type I and type II error rates of our testing procedure and baseline methods. Note that for a valid testing procedure, the type I error should be close to the significant level $\alpha$ (the dashed line), and the type II error should be close to zero.
  • Figure 3: Discretization error with respect to the bin length. Left: setting I with the confounding graph Fig. \ref{['fig.proxy']} (a). Right: setting II with the mediation graph Fig. \ref{['fig.proxy']} (b). For both settings, the blue line corresponds to the case where the smoothness condition, i.e., Asm. \ref{['asm.tv-smooth-main']}, holds, whereas the orange line corresponds to the case where the data is generated from a nonsmooth model.
  • Figure 4: Type I and type II error rates with respect to the bin number and sample size. We consider two settings for data generating, with setting I (left) using the confounding graph in Fig. \ref{['fig.proxy']} (a), and setting II (right) using the mediation graph in Fig. \ref{['fig.proxy']} (b).
  • Figure 5: Illustration of causal discovery in sepsis disease. Observable variables are marked in gray. WBC denotes the count of White Blood Cells, which is a common biomarker used to assess patient's response to medicines. By using the blood pressure as the proxy variable ($W$) for the health status ($U$), our goal is to determine whether the edge $\mathrm{Medicine} \,-\!\!\to \mathrm{WBC}$ exists or not.
  • ...and 1 more figures

Theorems & Definitions (35)

  • Remark 3.1
  • Example 4.2: ANM with completeness
  • Remark 4.3
  • Proposition 4.4
  • Remark 4.5
  • Remark 4.6
  • Example 4.8: ANM with TV smoothness
  • Proposition 4.9
  • Definition 4.10: Tight distribution
  • Example 4.12: ANM with tightness
  • ...and 25 more