CausalPlayground: Addressing Data-Generation Requirements in Cutting-Edge Causality Research
Andreas W M Sauter, Erman Acar, Aske Plaat
TL;DR
CausalPlayground introduces a Python-based platform to address the fragmented landscape of causal data generation by providing fine-grained SCM control, arbitrary interventions, interactive SCM engagement via Gymnasium, and scalable generation of SCM datasets. It formalizes SCM concepts, supports modular implementation, and demonstrates a use-case for benchmarking causal discovery algorithms, emphasizing reproducibility and comparability. The work highlights a gap in existing tools and posits CausalPlayground as a standardized, shareable framework with potential for broader model support and hardware-accelerated scaling. Overall, it aims to accelerate progress in causality research by enabling consistent data-generation workflows and online model interaction.
Abstract
Research on causal effects often relies on synthetic data due to the scarcity of real-world datasets with ground-truth effects. Since current data-generating tools do not always meet all requirements for state-of-the-art research, ad-hoc methods are often employed. This leads to heterogeneity among datasets and delays research progress. We address the shortcomings of current data-generating libraries by introducing CausalPlayground, a Python library that provides a standardized platform for generating, sampling, and sharing structural causal models (SCMs). CausalPlayground offers fine-grained control over SCMs, interventions, and the generation of datasets of SCMs for learning and quantitative research. Furthermore, by integrating with Gymnasium, the standard framework for reinforcement learning (RL) environments, we enable online interaction with the SCMs. Overall, by introducing CausalPlayground we aim to foster more efficient and comparable research in the field. All code and API documentation is available at https://github.com/sa-and/CausalPlayground.
