Causal Discovery via Conditional Independence Testing with Proxy Variables

Mingzhou Liu; Xinwei Sun; Yu Qiao; Yizhou Wang

Causal Discovery via Conditional Independence Testing with Proxy Variables

Mingzhou Liu, Xinwei Sun, Yu Qiao, Yizhou Wang

TL;DR

This paper designs a proxy-based hypothesis test for identifying causal relationships when unobserved variables are present that has ideal power when large samples are available and demonstrates the effectiveness of the method using synthetic and real-world data.

Abstract

Distinguishing causal connections from correlations is important in many scenarios. However, the presence of unobserved variables, such as the latent confounder, can introduce bias in conditional independence testing commonly employed in constraint-based causal discovery for identifying causal relations. To address this issue, existing methods introduced proxy variables to adjust for the bias caused by unobserveness. However, these methods were either limited to categorical variables or relied on strong parametric assumptions for identification. In this paper, we propose a novel hypothesis-testing procedure that can effectively examine the existence of the causal relationship over continuous variables, without any parametric constraint. Our procedure is based on discretization, which under completeness conditions, is able to asymptotically establish a linear equation whose coefficient vector is identifiable under the causal null hypothesis. Based on this, we introduce our test statistic and demonstrate its asymptotic level and power. We validate the effectiveness of our procedure using both synthetic and real-world data.

Causal Discovery via Conditional Independence Testing with Proxy Variables

TL;DR

Abstract

Paper Structure (27 sections, 7 theorems, 65 equations, 6 figures, 1 table, 1 algorithm)

This paper contains 27 sections, 7 theorems, 65 equations, 6 figures, 1 table, 1 algorithm.

Introduction
Related works
Preliminary
Methodology
Discretization under completeness
Discretization error analysis
Hypothesis test
Experiment
Synthetic data
Application to sepsis disease
Conclusions and Discussions
Notations & definitions
Discussion with works for causal direction identification
Discretization under completeness
Details of Exam. \ref{['exam.anm.completeness']}: ANM with completeness
...and 12 more sections

Key Result

Proposition 4.4

Suppose Asm. asm.comp-main holds. Then, for any discretization $\tilde{W}$ of $W$, there exists a discretization $\tilde{X}$ of $X$ such that the matrix $P(\tilde{W}|\tilde{X})$ has full row rank. Similarly, there also exists a discretization $\tilde{U}$ of $U$ such that the matrix $P(\tilde{W}|\til

Figures (6)

Figure 1: Causal diagrams illustrating causal discovery with proxy variables. (a) and (b) respectively represent the cases where $U$ is a latent confounder and a latent mediator. Note that our procedure is not restricted to these diagrams, but can apply to any scenario satisfying $X\perp \!\!\!\!\perp W|U$kuroki2014measurement.
Figure 2: Type I and type II error rates of our testing procedure and baseline methods. Note that for a valid testing procedure, the type I error should be close to the significant level $\alpha$ (the dashed line), and the type II error should be close to zero.
Figure 3: Discretization error with respect to the bin length. Left: setting I with the confounding graph Fig. \ref{['fig.proxy']} (a). Right: setting II with the mediation graph Fig. \ref{['fig.proxy']} (b). For both settings, the blue line corresponds to the case where the smoothness condition, i.e., Asm. \ref{['asm.tv-smooth-main']}, holds, whereas the orange line corresponds to the case where the data is generated from a nonsmooth model.
Figure 4: Type I and type II error rates with respect to the bin number and sample size. We consider two settings for data generating, with setting I (left) using the confounding graph in Fig. \ref{['fig.proxy']} (a), and setting II (right) using the mediation graph in Fig. \ref{['fig.proxy']} (b).
Figure 5: Illustration of causal discovery in sepsis disease. Observable variables are marked in gray. WBC denotes the count of White Blood Cells, which is a common biomarker used to assess patient's response to medicines. By using the blood pressure as the proxy variable ($W$) for the health status ($U$), our goal is to determine whether the edge $\mathrm{Medicine} \,-\!\!\to \mathrm{WBC}$ exists or not.
...and 1 more figures

Theorems & Definitions (35)

Remark 3.1
Example 4.2: ANM with completeness
Remark 4.3
Proposition 4.4
Remark 4.5
Remark 4.6
Example 4.8: ANM with TV smoothness
Proposition 4.9
Definition 4.10: Tight distribution
Example 4.12: ANM with tightness
...and 25 more

Causal Discovery via Conditional Independence Testing with Proxy Variables

TL;DR

Abstract

Causal Discovery via Conditional Independence Testing with Proxy Variables

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (35)