Table of Contents
Fetching ...

Deep Nonparametric Conditional Independence Tests for Images

Marco Simnacher, Xiangnan Xu, Hani Park, Christoph Lippert, Sonja Greven

TL;DR

Deep nonparametric CITs (DNCITs) are introduced, which combine embedding maps, which extract feature representations of high-dimensional variables, with nonparametric CITs applicable to these feature representations under varying confounder dimensions and confounder relationships.

Abstract

Conditional independence tests (CITs) test for conditional dependence between random variables. As existing CITs are limited in their applicability to complex, high-dimensional variables such as images, we introduce deep nonparametric CITs (DNCITs). The DNCITs combine embedding maps, which extract feature representations of high-dimensional variables, with nonparametric CITs applicable to these feature representations. For the embedding maps, we derive general properties on their parameter estimators to obtain valid DNCITs and show that these properties include embedding maps learned through (conditional) unsupervised or transfer learning. For the nonparametric CITs, appropriate tests are selected and adapted to be applicable to feature representations. Through simulations, we investigate the performance of the DNCITs for different embedding maps and nonparametric CITs under varying confounder dimensions and confounder relationships. We apply the DNCITs to brain MRI scans and behavioral traits, given confounders, of healthy individuals from the UK Biobank (UKB), confirming null results from a number of ambiguous personality neuroscience studies with a larger data set and with our more powerful tests. In addition, in a confounder control study, we apply the DNCITs to brain MRI scans and a confounder set to test for sufficient confounder control, leading to a potential reduction in the confounder dimension under improved confounder control compared to existing state-of-the-art confounder control studies for the UKB. Finally, we provide an R package implementing the DNCITs.

Deep Nonparametric Conditional Independence Tests for Images

TL;DR

Deep nonparametric CITs (DNCITs) are introduced, which combine embedding maps, which extract feature representations of high-dimensional variables, with nonparametric CITs applicable to these feature representations under varying confounder dimensions and confounder relationships.

Abstract

Conditional independence tests (CITs) test for conditional dependence between random variables. As existing CITs are limited in their applicability to complex, high-dimensional variables such as images, we introduce deep nonparametric CITs (DNCITs). The DNCITs combine embedding maps, which extract feature representations of high-dimensional variables, with nonparametric CITs applicable to these feature representations. For the embedding maps, we derive general properties on their parameter estimators to obtain valid DNCITs and show that these properties include embedding maps learned through (conditional) unsupervised or transfer learning. For the nonparametric CITs, appropriate tests are selected and adapted to be applicable to feature representations. Through simulations, we investigate the performance of the DNCITs for different embedding maps and nonparametric CITs under varying confounder dimensions and confounder relationships. We apply the DNCITs to brain MRI scans and behavioral traits, given confounders, of healthy individuals from the UK Biobank (UKB), confirming null results from a number of ambiguous personality neuroscience studies with a larger data set and with our more powerful tests. In addition, in a confounder control study, we apply the DNCITs to brain MRI scans and a confounder set to test for sufficient confounder control, leading to a potential reduction in the confounder dimension under improved confounder control compared to existing state-of-the-art confounder control studies for the UKB. Finally, we provide an R package implementing the DNCITs.

Paper Structure

This paper contains 31 sections, 33 equations, 12 figures, 1 algorithm.

Figures (12)

  • Figure 1: The DNCIT for an image $X$, a scalar $Y$ and a vector-valued confounder $Z=(Z_1,\hdots,Z_p)$. The black arrows indicate causal effects from $Z$ to $X$ and $Y$, the red lines represent the two steps of the DNCITs. DNCITs test for conditional dependence between $X$ and $Y$ by mapping $X$ through an embedding map $\omega$ to a vector-valued feature representation $X^\omega=(X^\omega_1,\hdots,X^\omega_q)$ in step one and applying a nonparametric CIT which tests for conditional dependence between $X^\omega$ and $Y$ given $Z$ in step two.
  • Figure 2: The rejection rates of the DNCITs for $c=0$ (CI, top) and $c=1$ (no CI, bottom) for increasingly complex confounder relationships (columns). For each column, the sample size is increased from left to right. The confounder dimension is set to 6. Horizontal lines at 0 and $\alpha=0.05$.
  • Figure 3: The rejection rates of the DNCITs for $c=0$ (CI, top) and $c=1$ (no CI, bottom) for increasing confounder dimension (columns). For each column, the sample size is increased from left to right. The confounder relationship is set to $g_z(\mathbf{s})=(\mathbf{s}^\top,(s_j^2)_{j\in\mathcal{J}_c})$, where $\mathcal{J}_c$ denotes the index set of continuous variables. Horizontal lines at 0 and $\alpha=0.05$.
  • Figure 4: The logarithm of the runtime of the DNCITs for $c=0$ (CI) for increasing confounder dimension (columns). Each column increases the sample size from left to right. The confounder relationship is set to $g_z(\mathbf{s})=(\mathbf{s}^\top,(s_j^2)_{j\in\mathcal{J}_c})$, where $\mathcal{J}_c$ denotes the index set of continuous variables.
  • Figure 5: In the left figure, $-\log_{10}(p)$ values of individual Wald tests are shown for each trait-brain structure combination, sorted by size for each trait. The trait-structure combinations identified as significant in avinun2020little are highlighted in black. In the right figure, $-\log_{10}(p)$ values are presented for the Freesurfer-WALD (left), Freesurfer-RCoT (center), and Fastsurfer-RCoT (right) across traits with all brain structures. Unicolored points and triangles represent curiosity, diligence, nervousness, neuroticism, sociability and warmth (left to right), while grey-filled points and triangles represent DNCITs for all BFBTs together. In both figures, tests with $-\log_{10}(p)>2.5$ are colored red and annotated. The solid vertical lines depict the $-\log_{10}(\alpha_{\text{bonf}})$ where $\alpha_{\text{bonf}}$ is the significance level Bonferroni adjusted for 642 (left) and 6 (right) tests for an individual significance level of 0.05, i.e. $\alpha_{\text{bonf}}=\frac{0.05}{642}$ and $\alpha_{\text{bonf}}=\frac{0.05}{6}$, respectively.
  • ...and 7 more figures