Table of Contents
Fetching ...

Amortized Conditional Independence Testing

Bao Duong, Nu Hoang, Thin Nguyen

TL;DR

This work tackles conditional independence testing for continuous, high‑dimensional data by moving away from handcrafted test statistics toward an amortized, data‑driven approach. It introduces ACID, a transformer‑based network that predicts CI outcomes from entire datasets $\mathcal{D}$ by training on synthetic labels with a dataset‑level encoder that uses self‑attention over dimensions, cross‑attention from $Z$ to $X$ and $Y$, and self‑attention over samples. The model outputs a logit for $p_{\theta}(\mathcal{G} \mid \mathcal{D})$, with a skewed‑normal null used to compute $p$-values, enabling fast, scalable CI testing. Empirically, ACID achieves state‑of‑the‑art performance on synthetic and real data, generalizes to unseen sample sizes and dimensionalities, and offers near‑zero inference cost, making it well suited for causal discovery and related applications.

Abstract

Testing for the conditional independence structure in data is a fundamental and critical task in statistics and machine learning, which finds natural applications in causal discovery - a highly relevant problem to many scientific disciplines. Existing methods seek to design explicit test statistics that quantify the degree of conditional dependence, which is highly challenging yet cannot capture nor utilize prior knowledge in a data-driven manner. In this study, an entirely new approach is introduced, where we instead propose to amortize conditional independence testing and devise ACID - a novel transformer-based neural network architecture that learns to test for conditional independence. ACID can be trained on synthetic data in a supervised learning fashion, and the learned model can then be applied to any dataset of similar natures or adapted to new domains by fine-tuning with a negligible computational cost. Our extensive empirical evaluations on both synthetic and real data reveal that ACID consistently achieves state-of-the-art performance against existing baselines under multiple metrics, and is able to generalize robustly to unseen sample sizes, dimensionalities, as well as non-linearities with a remarkably low inference time.

Amortized Conditional Independence Testing

TL;DR

This work tackles conditional independence testing for continuous, high‑dimensional data by moving away from handcrafted test statistics toward an amortized, data‑driven approach. It introduces ACID, a transformer‑based network that predicts CI outcomes from entire datasets by training on synthetic labels with a dataset‑level encoder that uses self‑attention over dimensions, cross‑attention from to and , and self‑attention over samples. The model outputs a logit for , with a skewed‑normal null used to compute -values, enabling fast, scalable CI testing. Empirically, ACID achieves state‑of‑the‑art performance on synthetic and real data, generalizes to unseen sample sizes and dimensionalities, and offers near‑zero inference cost, making it well suited for causal discovery and related applications.

Abstract

Testing for the conditional independence structure in data is a fundamental and critical task in statistics and machine learning, which finds natural applications in causal discovery - a highly relevant problem to many scientific disciplines. Existing methods seek to design explicit test statistics that quantify the degree of conditional dependence, which is highly challenging yet cannot capture nor utilize prior knowledge in a data-driven manner. In this study, an entirely new approach is introduced, where we instead propose to amortize conditional independence testing and devise ACID - a novel transformer-based neural network architecture that learns to test for conditional independence. ACID can be trained on synthetic data in a supervised learning fashion, and the learned model can then be applied to any dataset of similar natures or adapted to new domains by fine-tuning with a negligible computational cost. Our extensive empirical evaluations on both synthetic and real data reveal that ACID consistently achieves state-of-the-art performance against existing baselines under multiple metrics, and is able to generalize robustly to unseen sample sizes, dimensionalities, as well as non-linearities with a remarkably low inference time.

Paper Structure

This paper contains 25 sections, 16 equations, 5 figures.

Figures (5)

  • Figure 1: Overview of the proposed ACID architecture.
  • Figure 2: The learning process of ACID. Validation dataset is not needed since no dataset is seen twice during training. Shaded areas depict standard deviations.
  • Figure 3: Conditional Independence Testing performance on synthetic data. The evaluation metrics are AUC, $F_{1}$ score (higher is better), Type I and Type II errors (lower is better). Error bars are 95% confidence intervals.
  • Figure 4: Conditional Independence Testing performance in Out-of-distribution settings. The performance metrics are AUC, $F_{1}$ score (higher is better), Type I and Type II errors (lower is better). Error bars are 95% confidence intervals.
  • Figure 5: Conditional Independence Testing performance on Real data (the Sachs dataset Sachs_etall_05Causal). Time is measured on an Apple M1 CPU with 8 GB of RAM.