Table of Contents
Fetching ...

A Conditional Independence Test in the Presence of Discretization

Boyang Sun, Yu Yao, Guang-Yuan Hao, Yumou Qiu, Kun Zhang

TL;DR

A conditional independence test specifically designed to accommodate the presence of a discretization of latent variables, and design the bridge equations to recover the parameter reflecting the statistical information of the underlying latent continuous variables.

Abstract

Testing conditional independence has many applications, such as in Bayesian network learning and causal discovery. Different test methods have been proposed. However, existing methods generally can not work when only discretized observations are available. Specifically, consider $X_1$, $\tilde{X}_2$ and $X_3$ are observed variables, where $\tilde{X}_2$ is a discretization of latent variables $X_2$. Applying existing test methods to the observations of $X_1$, $\tilde{X}_2$ and $X_3$ can lead to a false conclusion about the underlying conditional independence of variables $X_1$, $X_2$ and $X_3$. Motivated by this, we propose a conditional independence test specifically designed to accommodate the presence of such discretization. To achieve this, we design the bridge equations to recover the parameter reflecting the statistical information of the underlying latent continuous variables. An appropriate test statistic and its asymptotic distribution under the null hypothesis of conditional independence have also been derived. Both theoretical results and empirical validation have been provided, demonstrating the effectiveness of our test methods.

A Conditional Independence Test in the Presence of Discretization

TL;DR

A conditional independence test specifically designed to accommodate the presence of a discretization of latent variables, and design the bridge equations to recover the parameter reflecting the statistical information of the underlying latent continuous variables.

Abstract

Testing conditional independence has many applications, such as in Bayesian network learning and causal discovery. Different test methods have been proposed. However, existing methods generally can not work when only discretized observations are available. Specifically, consider , and are observed variables, where is a discretization of latent variables . Applying existing test methods to the observations of , and can lead to a false conclusion about the underlying conditional independence of variables , and . Motivated by this, we propose a conditional independence test specifically designed to accommodate the presence of such discretization. To achieve this, we design the bridge equations to recover the parameter reflecting the statistical information of the underlying latent continuous variables. An appropriate test statistic and its asymptotic distribution under the null hypothesis of conditional independence have also been derived. Both theoretical results and empirical validation have been provided, demonstrating the effectiveness of our test methods.
Paper Structure (45 sections, 9 theorems, 84 equations, 9 figures, 1 algorithm)

This paper contains 45 sections, 9 theorems, 84 equations, 9 figures, 1 algorithm.

Key Result

Theorem 2.1

Let $X_1, X_2$ and $X_3$ be jointly Gaussian random variables that are mutually dependent, such that $X_1 \perp \!\!\! \perp X_3 | X_2$, $\tilde{X}_2 = f_j(g_j(X_2))$ is the discretized observation as defined in equation setting_formulation. Then the conditional independence between $X_1$ and $X_3$

Figures (9)

  • Figure 1: We illustrate data-generative processes with causal graphical models. The discretization process introduces new discrete variables indicated by a tilde ($\sim$).
  • Figure 2: Comparison of results of Type I and calibrated Type II error ($1-\textup{power}$) for all three types of tested data (continuous, mixed, discrete) and different number of samples and cardinality of conditioning set. The suffix attached to a test's name denotes the cardinality of discretization; for example, "Fisherz_4" signifies the application of the Fisher-z test to data discretized into four levels. Chi-square test is only applicable for the discrete case.
  • Figure 3: Experimental result of skeleton discovery on synthetic data for changing sample size (a) and changing number of nodes (b). Fisherz_nodis is the Fisher-z test applied to original continuous data. We evaluate $F_1$ ($\uparrow$), Precision ($\uparrow$), Recall ($\uparrow$) and SHD ($\downarrow$).
  • Figure 4: Experiment result of DAG discovery on synthetic data for changing sample size (a) and changing number of nodes (b). Fisherz_nodis is the Fisher-z test applied to original continuous data. We evaluate $F_1$ ($\uparrow$), Precision ($\uparrow$), Recall ($\uparrow$) and SHD ($\downarrow$).
  • Figure 5: Experiment result of causal discovery on synthetic data with $p=8$, $n=(100,500,2000,5000)$ where the data generation process violates our assumptions. The data are generated with either nongaussian distributed (a), (b), (c) or the relations are not linear (d). The figure reports $F_1$ ($\uparrow$), Precision ($\uparrow$), Recall ($\uparrow$) and SHD ($\downarrow$) on skeleton.
  • ...and 4 more figures

Theorems & Definitions (11)

  • Theorem 2.1
  • Definition 3.1: Bridge Equation for A Discretized-Variable Pair
  • Definition 3.2: Bridge Equation for A Continuous-Discretized-Variable Pair
  • Theorem 3.3: Independence Test
  • Lemma 3.4
  • Lemma 3.5: $\bm{\psi}_{\hat{\bm{\theta}}}^l$ for A Discretized-Variable Pair
  • Lemma 3.6: $\bm{\psi}_{\hat{\bm{\theta}}}^l$ for A Continuous-Discretized-Variable Pair
  • Lemma 3.7
  • Theorem 3.8: Conditional Independence test
  • Lemma B.1
  • ...and 1 more