Scalable and Flexible Causal Discovery with an Efficient Test for Adjacency

Alan Nawzad Amin; Andrew Gordon Wilson

Scalable and Flexible Causal Discovery with an Efficient Test for Adjacency

Alan Nawzad Amin, Andrew Gordon Wilson

TL;DR

The paper addresses scalable causal discovery for systems with thousands of variables by introducing the Differentiable Adjacency Test (DAT), which converts the discrete separating-set search into a differentiable optimization problem. DAT-Graph builds on DAT to learn large-scale causal graphs by first constructing a sparse moral graph and then performing two DAT-based adjacency tests to infer the skeleton, with extensions to learn from intervention data. Empirical results show that DAT-Graph scales to around 1000 variables, achieves competitive or superior skeleton accuracy compared to gradient-based baselines, and improves downstream intervention predictions in RNA sequencing data, especially when combined with hybrid modeling. Overall, the work offers a practical, scalable framework for reliable causal discovery in high-dimensional, complex systems and highlights the potential for integrating testing-based pruning with gradient-based model search to enhance performance.

Abstract

To make accurate predictions, understand mechanisms, and design interventions in systems of many variables, we wish to learn causal graphs from large scale data. Unfortunately the space of all possible causal graphs is enormous so scalably and accurately searching for the best fit to the data is a challenge. In principle we could substantially decrease the search space, or learn the graph entirely, by testing the conditional independence of variables. However, deciding if two variables are adjacent in a causal graph may require an exponential number of tests. Here we build a scalable and flexible method to evaluate if two variables are adjacent in a causal graph, the Differentiable Adjacency Test (DAT). DAT replaces an exponential number of tests with a provably equivalent relaxed problem. It then solves this problem by training two neural networks. We build a graph learning method based on DAT, DAT-Graph, that can also learn from data with interventions. DAT-Graph can learn graphs of 1000 variables with state of the art accuracy. Using the graph learned by DAT-Graph, we also build models that make much more accurate predictions of the effects of interventions on large scale RNA sequencing data.

Scalable and Flexible Causal Discovery with an Efficient Test for Adjacency

TL;DR

Abstract

Paper Structure (49 sections, 6 theorems, 38 equations, 15 figures, 4 tables)

This paper contains 49 sections, 6 theorems, 38 equations, 15 figures, 4 tables.

Introduction
Related work
Background
The differentiable adjacency test (DAT)
Relaxing the discrete search
Hardness of adjacency testing
Differentiable Adjacency Test
DAT is accurate and efficient in practice
Learning a graph with DAT (DAT-Graph)
Learning the moral graph
Testing adjacency with two DATs
Computational cost of DAT-Graph
Learning from data with interventions
Experiments
Learning from observational data
...and 34 more sections

Key Result

Theorem 4.3

(Proof in Appendix app: main proof) Assume Assumption ass: tail assump (pick $f_m$ to have thicker tails than $p$). The separating set selection problem and the separating representation search problem have the same answer. If $\tilde{Z}_{\psi^*, 1:M}$ is a separating representation then $\{Z_m\}_{\

Figures (15)

Figure 1: Our Differentiable Adjacency Test (DAT) relaxes a discrete search requiring exponentially many tests (a) into a differentiable search to solve an optimization problem (b).
Figure 2: DAT enables learning large graphs by solving the separating set selection problem accurately and efficiently. (a) We plot how accurately each method determines if two variables are adjacent (AUC) and how often it corretly identifies a separating set for two non-adjacent variables (P(Separating)) against number of variables ($M$). (b) We plot the time of running each method against number of variables ($M$) and the time it would take to use each method to learn a large graph against the size of a graph ($N$). We plot the mean and standard error across 3 replicates.
Figure 3: DAT-Graph learns large graphs accurately. We plot the mean error (SHD) and standard error against the size of the graph ($N$) across 3 replicates.
Figure 4: DAT-Graph learns sparser graphs more accurately. We plot the mean and standard error against the sparsity of the graph across 3 replicates. The legend is the same as that of Fig. \ref{['fig:observation']}.
Figure 5: DAT-Graph learns more accurate graphs when given intervention data. We plot the mean and standard error against the number of variables with interventions across 5 replicates.
...and 10 more figures

Theorems & Definitions (20)

Definition 3.1
Example 4.2
proof
Theorem 4.3
Proposition 4.4
Definition 5.1
Proposition 5.2
Example 5.3
Proposition 5.1
proof
...and 10 more

Scalable and Flexible Causal Discovery with an Efficient Test for Adjacency

TL;DR

Abstract

Scalable and Flexible Causal Discovery with an Efficient Test for Adjacency

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (15)

Theorems & Definitions (20)