Interpretable, multi-dimensional Evaluation Framework for Causal Discovery from observational i.i.d. Data

Georg Velev; Stefan Lessmann

Interpretable, multi-dimensional Evaluation Framework for Causal Discovery from observational i.i.d. Data

Georg Velev, Stefan Lessmann

TL;DR

The paper tackles evaluating causal discovery from i.i.d. observational data under non-identifiable nonlinear transformations by introducing a six-dimensional distance-to-optimal-solution (DOS) metric that aggregates six criteria for structure accuracy and causal inference. It pairs DOS with an interpretable, glass-box sensitivity analysis using EBM to study 7 experimental factors across 14 CSL methods, revealing that amortized methods like AVICI and order-based approaches such as R^2-SortnRegress perform robustly under challenging nonidentifiable patterns. The large-scale benchmark (15,360 datasets) demonstrates how data scale, graph type, and nonlinear transformation strength drive performance, and highlights varsortability as a key factor affecting gradient-based methods. The framework provides a holistic, interpretable benchmark tool for causal-discovery research and guides the design of robust evaluation protocols in realistic nonlinear settings.

Abstract

Nonlinear causal discovery from observational data imposes strict identifiability assumptions on the formulation of structural equations utilized in the data generating process. The evaluation of structure learning methods under assumption violations requires a rigorous and interpretable approach, which quantifies both the structural similarity of the estimation with the ground truth and the capacity of the discovered graphs to be used for causal inference. Motivated by the lack of unified performance assessment framework, we introduce an interpretable, six-dimensional evaluation metric, i.e., distance to optimal solution (DOS), which is specifically tailored to the field of causal discovery. Furthermore, this is the first research to assess the performance of structure learning algorithms from seven different families on increasing percentage of non-identifiable, nonlinear causal patterns, inspired by real-world processes. Our large-scale simulation study, which incorporates seven experimental factors, shows that besides causal order-based methods, amortized causal discovery delivers results with comparatively high proximity to the optimal solution.

Interpretable, multi-dimensional Evaluation Framework for Causal Discovery from observational i.i.d. Data

TL;DR

Abstract

Paper Structure (20 sections, 29 equations, 14 figures, 4 tables, 1 algorithm)

This paper contains 20 sections, 29 equations, 14 figures, 4 tables, 1 algorithm.

Introduction
Theoretical Background: Causal Discovery from i.i.d. observational Data
Related Work
Structural Causal Models
Causal Discovery Modelling Techniques
Simulation Framework and Evaluation Criteria
Related Benchmark Studies
Interpretable Multi-Criteria Evaluation Framework for Causal Discovery
Experimental Design
Large-Scale Causal Discovery Results
Conclusion
CI Relationships
ER and SF Graph Models
Benchmark Studies in the field of Causal Discovery for i.i.d. observational Data
One-dimensional Evaluation Criteria for Causal Discovery
...and 5 more sections

Figures (14)

Figure 1: Classification of CSL Methods
Figure 2: Connection between the two components of our performance assessment framework, i.e., DOS and EBM, visualized based on requirements w.r.t. comprehensive evaluation as well as interpretability.
Figure 3: Sampled number of edges on average for a ) ER graphs, and b) SF graphs
Figure 4: EBM importance scores, a) for each of the seven experimental factors in our simulation framework, and b) for the interaction effects estimated by EBM between a pair of experimental factors
Figure 5: a) Sensitivity of causal discovery techniques to varying node sizes, b) interaction effects between node size and sample size, c) sensitivity to different edge density levels, and d) interaction effects between connectivity and node size.
...and 9 more figures

Interpretable, multi-dimensional Evaluation Framework for Causal Discovery from observational i.i.d. Data

TL;DR

Abstract

Interpretable, multi-dimensional Evaluation Framework for Causal Discovery from observational i.i.d. Data

Authors

TL;DR

Abstract

Table of Contents

Figures (14)