Table of Contents
Fetching ...

CausalProfiler: Generating Synthetic Benchmarks for Rigorous and Transparent Evaluation of Causal Machine Learning

Panayiotis Panayiotou, Audrey Poinsot, Alessandro Leite, Nicolas Chesneau, Marc Schoenauer, Özgür Şimşek

TL;DR

CausalProfiler tackles the weak and opaque evaluation practices in causal ML by introducing a principled synthetic benchmark generator defined through Spaces of Interest that jointly specify causal models, queries, and data. It provides ground-truth causal queries across L1–L3, explicit sampling strategies for SCMs, and validation of ground-truth consistency, enabling robust, repeatable, and assumption-aware method evaluation. The framework demonstrates increased diversity over existing benchmarks and reveals that method performance substantially depends on the underlying causal mechanisms and data regimes. While synthetic benchmarks cannot replace real-data studies, CausalProfiler offers a rigorous complement to uncover failure modes, robustness to assumption violations, and guidance for method development.

Abstract

Causal machine learning (Causal ML) aims to answer "what if" questions using machine learning algorithms, making it a promising tool for high-stakes decision-making. Yet, empirical evaluation practices in Causal ML remain limited. Existing benchmarks often rely on a handful of hand-crafted or semi-synthetic datasets, leading to brittle, non-generalizable conclusions. To bridge this gap, we introduce CausalProfiler, a synthetic benchmark generator for Causal ML methods. Based on a set of explicit design choices about the class of causal models, queries, and data considered, the CausalProfiler randomly samples causal models, data, queries, and ground truths constituting the synthetic causal benchmarks. In this way, Causal ML methods can be rigorously and transparently evaluated under a variety of conditions. This work offers the first random generator of synthetic causal benchmarks with coverage guarantees and transparent assumptions operating on the three levels of causal reasoning: observation, intervention, and counterfactual. We demonstrate its utility by evaluating several state-of-the-art methods under diverse conditions and assumptions, both in and out of the identification regime, illustrating the types of analyses and insights the CausalProfiler enables.

CausalProfiler: Generating Synthetic Benchmarks for Rigorous and Transparent Evaluation of Causal Machine Learning

TL;DR

CausalProfiler tackles the weak and opaque evaluation practices in causal ML by introducing a principled synthetic benchmark generator defined through Spaces of Interest that jointly specify causal models, queries, and data. It provides ground-truth causal queries across L1–L3, explicit sampling strategies for SCMs, and validation of ground-truth consistency, enabling robust, repeatable, and assumption-aware method evaluation. The framework demonstrates increased diversity over existing benchmarks and reveals that method performance substantially depends on the underlying causal mechanisms and data regimes. While synthetic benchmarks cannot replace real-data studies, CausalProfiler offers a rigorous complement to uncover failure modes, robustness to assumption violations, and guidance for method development.

Abstract

Causal machine learning (Causal ML) aims to answer "what if" questions using machine learning algorithms, making it a promising tool for high-stakes decision-making. Yet, empirical evaluation practices in Causal ML remain limited. Existing benchmarks often rely on a handful of hand-crafted or semi-synthetic datasets, leading to brittle, non-generalizable conclusions. To bridge this gap, we introduce CausalProfiler, a synthetic benchmark generator for Causal ML methods. Based on a set of explicit design choices about the class of causal models, queries, and data considered, the CausalProfiler randomly samples causal models, data, queries, and ground truths constituting the synthetic causal benchmarks. In this way, Causal ML methods can be rigorously and transparently evaluated under a variety of conditions. This work offers the first random generator of synthetic causal benchmarks with coverage guarantees and transparent assumptions operating on the three levels of causal reasoning: observation, intervention, and counterfactual. We demonstrate its utility by evaluating several state-of-the-art methods under diverse conditions and assumptions, both in and out of the identification regime, illustrating the types of analyses and insights the CausalProfiler enables.

Paper Structure

This paper contains 72 sections, 1 theorem, 10 equations, 15 figures, 23 tables, 12 algorithms.

Key Result

proposition 5.1

For a Space of Interest $\mathcal{S} = \{ \mathbb{M}, \mathbb{Q}, \mathbb{D} \}$, whose class of Structural Causal Models is a class of Regional Discrete with the maximum number of noise regions, denoted $\mathbb{M}_{\texttt{RD-SCM},r=R_{\max}}$, any causal dataset $\mathcal{D} = \{Q, Q^\star, D, \ $^1$A formal definition can be found in Appendix app:regional_scms.

Figures (15)

  • Figure 1: Two-dimensional t-SNE plots of CausalProfiler's (green) and established benchmarks (red), characterized by metrics from the analysis module.
  • Figure 2: Causal graph of the price elasticity example.
  • Figure 3: Average degree of the causal graphs for the generated depending on the expected edge probability. Observation: The average degree corresponds on average to the degree of the generated causal graphs.
  • Figure 4: Variance of the causal graphs' degree of the generated depending on the number of variables and the expected edge probability. Observation: The variance of the degree increases with the size of the graph and its density.
  • Figure 5: Average causal paths length of the causal graphs of the generated depending on the number of variables and the expected edge probability. Observation: The length of causal paths increases with the size of the causal graph and its density.
  • ...and 10 more figures

Theorems & Definitions (9)

  • definition 4.1: Causal Dataset
  • definition 5.1: Space of Interest
  • proposition 5.1: Coverage
  • definition B.1: Semi-Markovian and Markovian
  • definition B.2: Causal Graph of a Semi-Markovian
  • definition E.1
  • definition E.2
  • definition K.1
  • definition K.2