Table of Contents
Fetching ...

Better Simulations for Validating Causal Discovery with the DAG-Adaptation of the Onion Method

Bryan Andrews, Erich Kummerfeld

TL;DR

This paper tackles the lack of standard simulation benchmarks for validating causal discovery algorithms by introducing the DAG-adaptation of the Onion (DaO) method, which uniformly samples correlation matrices $R$ that are Markov to a given DAG $G$. By focusing on the distribution over $R$ and incorporating scale-free DAG rewiring through $SFi$-DAG and $SFo$-DAG, DaO provides a domain-free, parameter-free, fair benchmark that avoids common artifacts such as varsortability and $R^2$-sortability biases. The authors prove that DaO yields uniform sampling over the space of correlation matrices that respect the DAG, and they provide open-source Python and R implementations. Through comparative simulations against ZARX and Tetrad designs, DaO shows distinct, more uniform model distributions, highlighting how previous simulations can spuriously favor certain causal discovery approaches. Overall, DaO offers a principled, universal standard for evaluating CDAs and lays groundwork for domain-specific extensions and larger-scale benchmarking efforts.

Abstract

The number of artificial intelligence algorithms for learning causal models from data is growing rapidly. Most ``causal discovery'' or ``causal structure learning'' algorithms are primarily validated through simulation studies. However, no widely accepted simulation standards exist and publications often report conflicting performance statistics -- even when only considering publications that simulate data from linear models. In response, several manuscripts have criticized a popular simulation design for validating algorithms in the linear case. We propose a new simulation design for generating linear models for directed acyclic graphs (DAGs): the DAG-adaptation of the Onion (DaO) method. DaO simulations are fundamentally different from existing simulations because they prioritize the distribution of correlation matrices rather than the distribution of linear effects. Specifically, the DaO method uniformly samples the space of all correlation matrices consistent with (i.e. Markov to) a DAG. We also discuss how to sample DAGs and present methods for generating DAGs with scale-free in-degree or out-degree. We compare the DaO method against two alternative simulation designs and provide implementations of the DaO method in Python and R: https://github.com/bja43/DaO_simulation. We advocate for others to adopt DaO simulations as a fair universal benchmark.

Better Simulations for Validating Causal Discovery with the DAG-Adaptation of the Onion Method

TL;DR

This paper tackles the lack of standard simulation benchmarks for validating causal discovery algorithms by introducing the DAG-adaptation of the Onion (DaO) method, which uniformly samples correlation matrices that are Markov to a given DAG . By focusing on the distribution over and incorporating scale-free DAG rewiring through -DAG and -DAG, DaO provides a domain-free, parameter-free, fair benchmark that avoids common artifacts such as varsortability and -sortability biases. The authors prove that DaO yields uniform sampling over the space of correlation matrices that respect the DAG, and they provide open-source Python and R implementations. Through comparative simulations against ZARX and Tetrad designs, DaO shows distinct, more uniform model distributions, highlighting how previous simulations can spuriously favor certain causal discovery approaches. Overall, DaO offers a principled, universal standard for evaluating CDAs and lays groundwork for domain-specific extensions and larger-scale benchmarking efforts.

Abstract

The number of artificial intelligence algorithms for learning causal models from data is growing rapidly. Most ``causal discovery'' or ``causal structure learning'' algorithms are primarily validated through simulation studies. However, no widely accepted simulation standards exist and publications often report conflicting performance statistics -- even when only considering publications that simulate data from linear models. In response, several manuscripts have criticized a popular simulation design for validating algorithms in the linear case. We propose a new simulation design for generating linear models for directed acyclic graphs (DAGs): the DAG-adaptation of the Onion (DaO) method. DaO simulations are fundamentally different from existing simulations because they prioritize the distribution of correlation matrices rather than the distribution of linear effects. Specifically, the DaO method uniformly samples the space of all correlation matrices consistent with (i.e. Markov to) a DAG. We also discuss how to sample DAGs and present methods for generating DAGs with scale-free in-degree or out-degree. We compare the DaO method against two alternative simulation designs and provide implementations of the DaO method in Python and R: https://github.com/bja43/DaO_simulation. We advocate for others to adopt DaO simulations as a fair universal benchmark.
Paper Structure (27 sections, 5 theorems, 28 equations, 17 figures, 4 tables, 9 algorithms)

This paper contains 27 sections, 5 theorems, 28 equations, 17 figures, 4 tables, 9 algorithms.

Key Result

Lemma 1

If $R_i$ is positive definite, then $R_{i+1}$ is positive definite if and only if:

Figures (17)

  • Figure 1: Vertex in/out-degree distributions for 100 DAGs with $|V| = 100$ and $\alpha = 10$.
  • Figure 2: Edge matrices for ER/SFi/SFo-DAGs with $|V| = 100$ and $\alpha = \frac{99}{2}$ (density $\frac{1}{2}$).
  • Figure 3: Uniformly sampled correlation matrices from DAGs with $|V| = 3$ and $|E| = 2$ corresponding to the $1 < 2 < 3$ column of Table \ref{['tab:er_dags']} with 100 repetition for each case. These are 2D projections of a 3D space, so many points are occluded by other points closer to the viewer.
  • Figure 4: Properties of 10 models generated from ER/SFi/SFo-DAGs with $|V| = 100$ and $\alpha = 10$.
  • Figure 5: Properties of 100 models generated from ER-DAGs with $|V| = 100$ and $\alpha = 10$.
  • ...and 12 more figures

Theorems & Definitions (6)

  • Lemma 1
  • Lemma 2
  • Lemma 3
  • Lemma 4
  • Theorem 5
  • Remark 6