Table of Contents
Fetching ...

Interpretable, multi-dimensional Evaluation Framework for Causal Discovery from observational i.i.d. Data

Georg Velev, Stefan Lessmann

TL;DR

The paper tackles evaluating causal discovery from i.i.d. observational data under non-identifiable nonlinear transformations by introducing a six-dimensional distance-to-optimal-solution (DOS) metric that aggregates six criteria for structure accuracy and causal inference. It pairs DOS with an interpretable, glass-box sensitivity analysis using EBM to study 7 experimental factors across 14 CSL methods, revealing that amortized methods like AVICI and order-based approaches such as R^2-SortnRegress perform robustly under challenging nonidentifiable patterns. The large-scale benchmark (15,360 datasets) demonstrates how data scale, graph type, and nonlinear transformation strength drive performance, and highlights varsortability as a key factor affecting gradient-based methods. The framework provides a holistic, interpretable benchmark tool for causal-discovery research and guides the design of robust evaluation protocols in realistic nonlinear settings.

Abstract

Nonlinear causal discovery from observational data imposes strict identifiability assumptions on the formulation of structural equations utilized in the data generating process. The evaluation of structure learning methods under assumption violations requires a rigorous and interpretable approach, which quantifies both the structural similarity of the estimation with the ground truth and the capacity of the discovered graphs to be used for causal inference. Motivated by the lack of unified performance assessment framework, we introduce an interpretable, six-dimensional evaluation metric, i.e., distance to optimal solution (DOS), which is specifically tailored to the field of causal discovery. Furthermore, this is the first research to assess the performance of structure learning algorithms from seven different families on increasing percentage of non-identifiable, nonlinear causal patterns, inspired by real-world processes. Our large-scale simulation study, which incorporates seven experimental factors, shows that besides causal order-based methods, amortized causal discovery delivers results with comparatively high proximity to the optimal solution.

Interpretable, multi-dimensional Evaluation Framework for Causal Discovery from observational i.i.d. Data

TL;DR

The paper tackles evaluating causal discovery from i.i.d. observational data under non-identifiable nonlinear transformations by introducing a six-dimensional distance-to-optimal-solution (DOS) metric that aggregates six criteria for structure accuracy and causal inference. It pairs DOS with an interpretable, glass-box sensitivity analysis using EBM to study 7 experimental factors across 14 CSL methods, revealing that amortized methods like AVICI and order-based approaches such as R^2-SortnRegress perform robustly under challenging nonidentifiable patterns. The large-scale benchmark (15,360 datasets) demonstrates how data scale, graph type, and nonlinear transformation strength drive performance, and highlights varsortability as a key factor affecting gradient-based methods. The framework provides a holistic, interpretable benchmark tool for causal-discovery research and guides the design of robust evaluation protocols in realistic nonlinear settings.

Abstract

Nonlinear causal discovery from observational data imposes strict identifiability assumptions on the formulation of structural equations utilized in the data generating process. The evaluation of structure learning methods under assumption violations requires a rigorous and interpretable approach, which quantifies both the structural similarity of the estimation with the ground truth and the capacity of the discovered graphs to be used for causal inference. Motivated by the lack of unified performance assessment framework, we introduce an interpretable, six-dimensional evaluation metric, i.e., distance to optimal solution (DOS), which is specifically tailored to the field of causal discovery. Furthermore, this is the first research to assess the performance of structure learning algorithms from seven different families on increasing percentage of non-identifiable, nonlinear causal patterns, inspired by real-world processes. Our large-scale simulation study, which incorporates seven experimental factors, shows that besides causal order-based methods, amortized causal discovery delivers results with comparatively high proximity to the optimal solution.
Paper Structure (20 sections, 29 equations, 14 figures, 4 tables, 1 algorithm)

This paper contains 20 sections, 29 equations, 14 figures, 4 tables, 1 algorithm.

Figures (14)

  • Figure 1: Classification of CSL Methods
  • Figure 2: Connection between the two components of our performance assessment framework, i.e., DOS and EBM, visualized based on requirements w.r.t. comprehensive evaluation as well as interpretability.
  • Figure 3: Sampled number of edges on average for a ) ER graphs, and b) SF graphs
  • Figure 4: EBM importance scores, a) for each of the seven experimental factors in our simulation framework, and b) for the interaction effects estimated by EBM between a pair of experimental factors
  • Figure 5: a) Sensitivity of causal discovery techniques to varying node sizes, b) interaction effects between node size and sample size, c) sensitivity to different edge density levels, and d) interaction effects between connectivity and node size.
  • ...and 9 more figures